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Preface 


The Origin of This Book 


This text grew out of two types of real analysis courses taught by the author at Bard 
College, one for undergraduate mathematics majors, and the other for students in the 
mathematics section of Bard’s Masters of Arts in Teaching (M.A.T.) Program. Bard’s 
undergraduate real analysis course is a standard introductory course at the junior— 
senior level, but the M.A.T. real analysis course, as explained below, is somewhat less 
standard. The author was therefore unable to find an existing real analysis textbook 
that exactly met the needs of the students in the M.A.T. course, and so this text was 
written to fill the gap. To make this text more broadly useful, however, it has been 
written in a way that makes it sufficiently flexible to meet the needs of a standard 
undergraduate real analysis course as well, though with a few distinguishing features. 

One of the principles on which Bard’s M.A.T. Program was founded is that 
secondary school teachers need, in addition to sufficient training in pedagogy, a sub- 
stantial background in their subject areas. In the Bard M.A.T. Program in Mathematics, 
not only are all students required to have completed the equivalent of a B.A. in mathe- 
matics to enroll in the program, but they are required to take four mathematics courses 
in the M.A.T. Program, one of which is in real analysis. The M.A.T. mathematics 
courses are different from standard first-year mathematics graduate courses, in that 
rather than directing the students toward more advanced mathematical topics, the em- 
phasis is on giving the students an advanced look at the material taught in secondary 
school mathematics courses. For example, it is important for prospective teachers of 
calculus to have a good understanding of the properties of the real numbers (including 
decimal expansion), and a detailed look at logarithmic, exponential and trigonometric 
functions, none of which is usually treated in detail in standard undergraduate real 
analysis courses. Of course, a prospective teacher of calculus must also have a good 
grasp of limits, differentiation and integration, as found in any real analysis course. By 
contrast, it is not as important for prospective secondary teachers to spend valuable 
course time on some standard introductory real analysis topics such as sequences and 
series of functions. Hence, the focus of a real analysis course for M.A.T. students is 
somewhat different from a standard undergraduate real analysis course. 


xii Preface 


This text contains all the material needed for both a standard introductory course 
in real analysis and for variants of such a course aimed at prospective teachers. It is 
the hope of this author that for each intended audience, this text will offer a clear, 
accessible and interesting exposition of this beautiful material. 


Audience 


This text is aimed at three target audiences: 


1. Mathematics majors taking a standard introductory real analysis course; 

2. Prospective secondary school mathematics teachers taking an introductory real 
analysis course; 

3. Prospective secondary school mathematics teachers taking a second real analysis 
course. 


For undergraduate mathematics majors taking an introductory real analysis course, 
this text covers all the standard topics that are typically treated in an introductory 
single-variable real analysis book. The order of the material is slightly different than 
usual (with sequences being treated after derivatives and integrals), and as a result a 
few of the proofs are different, but all the standard topics are present, as well as a few 
extras. 

For prospective secondary school mathematics teachers taking an introductory real 
analysis course, this text has, in addition to the standard topics one would encounter 
in any undergraduate real analysis course, a thorough treatment of the properties of 
the real numbers, and an equally thorough treatment of logarithmic, exponential and 
trigonometric functions. Additionally, the book contains some historical information 
that a mathematics teacher could use to enliven a calculus course. 

For prospective secondary school mathematics teachers taking a second real 
analysis course (for example, M.A.T. students in mathematics who have already had 
an undergraduate real analysis course), this text has, in addition to a review of the 
basic topics of real analysis (limits, derivatives, integrals, sequences), a development 
of the real numbers starting with the Peano Postulates, a detailed discussion of the 
decimal expansion of real numbers via least upper bounds, a thorough treatment 
of logarithmic, exponential and trigonometric functions, and additional topics not 
usually found in introductory real analysis texts (for example, a discussion of 7 in 
terms of the circumference and area of circles, and a proof of the equivalence of 
various theorems such as the Extreme Value Theorem and the Bolzano—Weierstrass 
Theorem with the Least Upper Bound Property). It is the belief of this author that for 
those M.A.T. students who have already had an undergraduate course in real analysis, 
the proper training for prospective teachers is not to offer a course in more advanced 
topics in analysis (for example, Lebesgue measure and integration, or metric spaces), 
but rather to discuss in more detail those aspects of single-variable real analysis that 
are most directly related to the topics that teachers encounter in secondary schools. 
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Pedagogical Concerns 


Regardless of any particular choices in the selection and order of the material in this 
text, at heart this text is a detailed and rigorous introduction to real analysis designed 
for students who have not previously studied the subject. 

Some of the pedagogical concerns of this text are as follows. 


Slow and Steady 


Though it is fun to rush straight to the most exciting results in a subject, and it is 
tempting to skip over the details of some proofs (either because they seem too routine 
or because they seem too long), in the author’s experience the best way for students 
to learn the basics of a technical subject such as real analysis is to work through all 
the details of the subject slowly and steadily. Students need to be challenged, but in 
an introductory text such as this one it is best to leave the challenges to the exercises, 
not the proofs of theorems. Most proofs in this text are written out in full detail, and 
when details are omitted, that is stated explicitly. When previous results in the text are 
used in a proof, those results are always referenced. Most other real analysis texts of 
this length cover more material than we do; our aim is not to fit as much material as 
possible into the book, but to provide sufficient material for a one-semester course 
(with a few options for the instructor), and to cover that material as thoroughly as 
possible. 


Careful Writing 


Every effort has been made to provide clearly and carefully written definitions, 
theorems and proofs throughout the book. As seen in the author’s book Proofs and 
Fundamentals: A First Course in Abstract Mathematics (Birkhauser, Boston, 2000), 
the author views the careful writing of proofs as an important part of both teaching 
and learning rigorous mathematics, and he has attempted to adhere to the advice he 
gave about writing in that book. 


Minimal Technicalities 


Though real analysis is technical by nature, this text attempts to keep technical 
concepts to a minimum. For example, we omit discussion of limit inferior and limit 
superior, because we can accomplish everything we need without it. 

When technicalities are kept to a minimum, the result is that some particularly 
slick proofs are not available, which is unaesthetic to the experienced mathematician. 
For the sake of student learning, however, it it better to use a minimum number of 
technicalities repeatedly than to have the shortest or cleverest proof of each theorem. 
For example, there are some theorems involving continuity, differentiation and inte- 
gration (such as the Extreme Value Theorem and the Intermediate Value Theorem) 
that can be proved very efficiently by using sequences, but which we prove using only 
the basic properties of the real numbers (and in particular the Least Upper Bound 
Property). 
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Another example of keeping technicalities to a minimum is our choice of Dedekind 
cuts rather than Cauchy sequences for constructing the real numbers from the rational 
numbers; the method of Dedekind cuts is slightly longer, but it avoids both sequences 
and equivalence classes. (Of course, there is a discussion of Cauchy sequences in this 
text, but it is in its natural place in the chapter on sequences, which is well after the 
chapter on the construction of the real numbers.) 


Features 


There are many undergraduate books in real analysis, but the author has not found 
any with the exact same choice of material and pedagogical concerns as this text. 
Some of the distinguishing features of this text are as follows. 


Thorough Treatment of the Real Numbers 


At the heart of real analysis are the properties of the real numbers. Whereas most 
introductory real analysis texts move as quickly as possible to the core topics of 
calculus (such as limits, derivatives and integrals) by giving relatively brief treatments 
of the axioms for the real numbers and the consequences of those axioms, this text 
emphasizes the importance of the properties of the real numbers as the basis of real 
analysis. Hence, the real numbers and their properties are developed in more detail 
than is found in most other introductory real analysis texts. The goal of the text is for 
students to have a thorough understanding of the fundamentals of real analysis, not to 
cover as much ground as possible. 


Multiple Entryways 


A particularly distinctive feature of this text is that it offers three ways to enter into 
the study of the real numbers. 

Entry 1, which yields the most complete treatment of the real numbers, begins 
with the Peano Postulates for the natural numbers, and then leads to the construction 
of the integers, the rational numbers and the real numbers, proving the main properties 
of each set of numbers along the way. 

Entry 2, which is more efficient than Entry | but more detailed than Entry 3, 
skips over the axiomatic treatment of the natural numbers, and begins instead with an 
axiomatic treatment of the integers. It is first shown that inside the integers sits a copy 
of the natural numbers, and after that the rational numbers and the real numbers are 
constructed, and their main properties proved. 

Entry 3, which is the most efficient approach to the real numbers, starts with an 
axiomatic treatment of the real numbers. It is shown that inside the real numbers sit 
the natural numbers, the integers and the rational numbers. This approach is the one 
taken in most contemporary introductions to real analysis, though we give a bit more 
details about the natural numbers, integers and rational numbers than is common. 

The existence of the three entryways into the real numbers allows for great flexi- 
bility in the use of this text. For a first real analysis course, whether for mathematics 
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majors or prospective secondary school mathematics teachers, Entry 3 should be 
used; for a second real analysis course for prospective secondary school mathematics 
teachers, or as supplementary reading for a standard introductory real analysis course, 
Entry | or Entry 2 should be used. No matter which entry is used, all students end up 
knowing the same properties of the real numbers, and hence are equally prepared for 
the subsequent material. 


Follows Order of Material in Calculus Courses 


Undergraduate real analysis courses are often organized according to the goal of 
preparing the students for more advanced mathematics courses. Such a design, how- 
ever, does not necessarily lead to the best pedagogical approach. Whereas many 
of the more advanced aspects of real analysis are quite abstract, the motivation for 
introductory real analysis is the need for a rigorous foundation for calculus. Given 
that the students in an introductory real analysis course have already had courses in 
calculus, and given that pedagogically it is best to relate new material to that which is 
already familiar, this text presents the basic material in real analysis in an order that 
is closer to that encountered in calculus courses than is found in most real analysis 
books. 


Sequences Later in the Text 


In standard calculus courses, sequences usually receive very minimal treatment, and 
are discussed only as much as they are needed as partial sums of series. In real 
analysis, by contrast, sequences are a very important tool, and they are treated in great 
detail in most real analysis texts. Moreover, in most such texts sequences are located 
right after the preliminary treatment of the real numbers, and prior to the discussion 
of limits, derivatives and integrals, both because the definition of the convergence of 
sequences is viewed as slightly easier to learn than the definition of the convergence 
of functions, and because some of the major theorems about sequences (such as the 
Bolzano—Weierstrass Theorem) are used in the proofs of some important theorems 
about continuous functions, derivatives and integrals (such as the Extreme Value 
Theorem). 

In this text, by contrast, sequences are treated after the chapters on limits, deriva- 
tives and integrals, similarly to the order of material in a calculus course. Whereas 
sequences are used in many real analysis books in the proofs of some of the important 
theorems concerning functions, it turns out that all such theorems can be proved with- 
out the use of sequences, where instead of using the Bolzano—Weierstrass Theorem 
and similar results, a direct appeal is made to the Least Upper Bound Property, or 
to direct consequences of that property. As such, it is possible to treat continuous 
functions, derivatives and integrals without the added technicality of sequences, and 
in the order familiar from calculus courses. Moreover, the use of sequences in proofs 
where they are not needed, while sometimes making for short and clever proofs, may 
at times obscure the essential ideas of the theorem being proved. 
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Of course, wherever they are placed, sequences are a very important topic in 
real analysis, and they are given a thorough treatment in this text, with all the usual 
theorems proved. 


Integration via Riemann Sums 


There are two standard ways of defining the Riemann Integral that are found in 
introductory real analysis texts: via Riemann sums, and via upper and lower integrals. 
The latter approach is used by many (if not most) current introductory real analysis 
books, and it gives a fast route to proving the important theorems about integrals. 
However, given that the treatment of integrals in calculus courses is via Riemann 
sums, this text also uses that approach in its definition of integrals, so that students can 
understand the rigorous treatment of integrals in terms of what they had previously 
seen in calculus courses. 


Equivalence of Various Theorems with the Least Upper Bound Property 


Every student in a real analysis course learns that the Least Upper Bound Property is 
at the heart of what the real numbers consist of, and it is the basis for the proofs of 
many of the main theorems of real analysis, such as the Extreme Value Theorem and 
the Bolzano—Weierstrass theorem. Many of these theorems, for example the two just 
mentioned, are in fact logically equivalent to the Least Upper Bound Property, and in 
this text we present a proof of this logical equivalence, which is not commonly found 
in real analysis books. 


Thorough Discussion of Transcendental Functions 


Logarithmic, exponential and trigonometric functions are familiar to students from 
precalculus and calculus courses. Whereas most introductory real analysis books 
either ignore these functions or give them a cursory treatment, in this text these 
functions are defined rigorously, and their basic properties are proved in detail. Of 
particular note is our treatment of the sine and cosine functions; these functions are 
trickier to define rigorously than logarithms and exponentials, but nonetheless deserve 
a thorough exposition. 


Discussion of Area and Arc Length 


The main motivation for the development of the definite integral is to compute areas 
of certain regions in the plane. However, the very important fact that the definite 
integral of a non-negative function yields the area under the graph of the function, 
while regularly asserted, is rarely proved in introductory real analysis books (indeed, 
the concept of area is rarely defined rigorously), which leads not only to a gap in rigor 
but also to an incomplete understanding of the concept of area. In this text, we give 
a thorough discussion of area and arc length, starting with geometric definitions of 
these concepts, and then proofs that in appropriate cases, they can be computed via 
definite integrals. 
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More about 7 


Students are familiar with the number z from a very young age, where it is discussed in 
the context of the circumference and area of circles. The number 7 is also introduced 
into the study of trigonometric functions in precalculus and calculus courses. In real 
analysis, if the trigonometric functions are to be studied, then the number z cannot be 
avoided. In this text, a particularly detailed treatment of 7 is given, in order to clarify 
the relation between the geometric approach to this number (via the circumference of 
circles) and analytic approach to it (via the definition of the trigonometric functions 
using integrals). 


Reflections for Every Section 


The heart of mathematics is the details, and students in a real analysis course quite 
naturally get very caught up in the e’s and 6’s. However, it is useful at times to step 
back from the details and ask broader questions, such as: why are things done as 
they are; why are some aspects of the material straightforward and other aspects not; 
whether all the hypotheses of the theorems are really needed; and whether there might 
be an easier way to define or prove things. In real analysis, moreover, it is also helpful 
to compare the way things are done in that course with the way they were done (or 
not done) in calculus courses that the student took previously. Hence, every section of 
this text concludes with some very brief remarks that look back upon the material in 
the section, often in the context of what the student has seen prior to real analysis. The 
main purpose of these remarks is not, however, simply to convey the author’s thoughts 
about the material, but is rather to encourage the reader to engage in her own similar 
reflections upon the material discussed in this text, and upon other mathematical ideas 
encountered subsequently. 


Historical Remarks for Every Chapter 


The material in this book is presented in the logical order of development that is 
now standard for real analysis, but which is quite different from the way the subject 
developed historically. Though it would be very inefficient to learn the details of real 
analysis in the order in which it occurred historically, because it took mathematicians 
a rather circuitous route to reach the understanding we have today, it is nonetheless 
beneficial for mathematicians to know something about how important topics such as 
calculus arose. Such historical context is especially valuable for prospective teachers, 
though it can benefit all students of real analysis, not in understanding the details of 
rigorous definitions and proofs, but in seeing the bigger picture. Hence, each chapter 
concludes with a historical discussion of the material in the chapter. 

The author is not a historian, and he hopes that the historical material provided is 
both useful and informative. For a thorough and engaging treatment of the history of 
mathematics in general, the reader is referred to [Kat98]. Because of the availability 
of the wonderful website [OR], which has extensive biographical information on 
every mathematician about whom the reader has heard (and many others as well), the 
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historical material in this text does not include biographical information (other than 
dates of birth and death) about the mathematicians who developed real analysis. 


Errors 


In spite of the author’s best effort, there will inevitably be some errors in this text. 
If the reader finds any such errors, it would be very helpful if she would send them 
to the author at bloch@bard.edu. An updated list of errors is available at http: 
//math.bard.edu/bloch/rnra_errata. pdf. 
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To the Student 


The Material 


Calculus has a number of important aspects: intuition, computation, application and 
rigor. Standard introductory calculus courses in American universities and colleges 
treat the first three of these well, but gloss over the fourth. At base, a first course in 
real analysis, such as a course that uses this text, is an exposition of the rigorous ideas 
that make calculus work. 

Of course, if all that a course in real analysis did was to verify that everything 
done in calculus is correct, it would hardly be worth the effort, because most of us are 
willing to take on faith that the people who developed calculus got it right. In fact, in 
the course of giving rigorous definitions and proofs of the main concepts of calculus, 
areal analysis course introduces the student to many fascinating and powerful new 
concepts and techniques of proof. These ideas, for example rigorous definitions of 
limits and continuity, turn out to be useful in both the further study of real analysis, 
as well as in fields such as topology and complex analysis. As such, a first course in 
real analysis is both a completion of the study of calculus commenced in introductory 
calculus courses, and also an entrance into further advanced study in a number of 
branches of mathematics. 

The history of calculus is virtually the opposite of how we present it in a modern 
real analysis course. Calculus started with the ideas of derivatives and integrals, as 
well as their applications; the details at the time were not entirely rigorous by modern 
standards. As new phenomena was discovered, the need for a more rigorous treatment 
of derivatives and integrals was understood, and that led to the formulation of the 
definition of limits, and the use of limits as the basis for all other aspects of calculus. 
Finally, as the properties of limits were explored, it was realized that completely 
rigorous proofs regarding limits could only be obtained if we had a rigorous treatment 
of the real numbers. Today, we learn real analysis in its logical, as opposed to historical, 
order, which means starting with a detailed look at the real numbers, then limits and 
continuity, and then derivatives and integrals, followed by additional topics such as 
sequences and series. 
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This text, which is designed for a one-semester course in real analysis, has more 
material than can typically be covered in one semester, in order to accommodate 
different choices of emphasis by the instructor (or by yourself, if you are reading this 
book on your own). An outline of the text is as follows. 


Chapters 1 and 2: Construction of the Real Numbers and Properties of the Real 
Numbers 


There are two standard ways to discuss the real numbers rigorously: either start with 
an axiomatic treatment of the natural numbers or the integers, and then construct the 
real numbers, or start with an axiomatic treatment of the real numbers. The former 
approach is treated in Chapter 1, and the latter in Section 2.2. The properties of the 
real numbers are explored in the rest of Chapter 2. 


Chapters 3-5: Limits and Continuity, Differentiation and Integration 
These three chapters contain a rigorous treatment of the core material from calculus. 
Chapters 6 and 7: Limits to Infinity and Transcendental Functions 


These two chapter contain optional material, the first covering some topics usually 
found in a second-semester calculus course, for example I’ Hépital’s Rule and improper 
integrals, and the second providing a rigorous treatment of logarithmic, exponential 
and trigonometric functions. 


Chapters 8 and 9: Sequences and Series 


Sequences and series, often discussed in a second-semester calculus course, take on a 
more important role in real analysis, and are core topics in a first course in the subject. 


Chapter 10: Sequences and Series of Functions 


This chapter is a fitting close to the book, because it involves many of the ideas that 
were discussed in earlier chapters. The highlights of this chapter include a treatment 
of Taylor series, and an example of a function that is continuous everywhere but 
differentiable nowhere. 


Prerequisites 


To be ready for an introductory real analysis course, a student must have taken the 
standard calculus courses, must have some experience writing rigorous mathematical 
proofs, and must be familiar with the basic properties of sets, functions and relations 
(as found, for example, in [Blo10]). Because many students find the material in real 
analysis a bit harder to learn than the material in other standard junior—senior level 
proofs-based mathematics courses such as abstract algebra, it is often recommended, 
though not necessarily required, that students have taken another junior—senior level 
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proofs-based course prior to studying real analysis, for the sake of having more 
experience with rigorous mathematical proofs. 


Notation 


The notation used in this book is, as much as possible, standard. We assume that the 
reader is familiar with basic notation involving sets and functions, for example unions 
and intersections of sets, and we will not review all such notation here. However, 
because not all mathematical notation is entirely standardized (a tribute to the decen- 
tralized ethos of the mathematical community), we list here a few items of notation 
that we will use, but which might not have been encountered previously by the reader. 
Of course, we will define a lot of new notation throughout this book. 


N natural numbers 

Z, integers 

Q rational numbers 

R real numbers 
NU{O} non-negative integers 
Z—{0} non-zero integers 


(0,9) positive real numbers 
0, ©) non-negative real numbers 
R-—{0} non-zero real numbers 
ACB subset, not necessarily proper 
AGB proper subset 
A-—B set difference 
la identity map on the set A 
fla restriction of the function f to the set A 


inverse function of a bijective function f 


Exercises 


Mathematics is learned by actively doing it, not by passively reading about it. Certainly, 
listening to lectures and reading the textbook is an important part of any mathematics 
course, but the real learning occurs when the student does exercises, which provide 
an opportunity for the student to make use of the concepts discussed in the course, 
and to see whether she understands these concepts. 

The exercises in this text have been arranged in order so that in the course of 
working on an exercise, a student may use any previous theorem or exercise (whether 
or not she did it), but not any subsequent result (unless stated otherwise). If an exercise 
makes use of a previous exercise, that previous exercise is sometimes noted at the end 
of the exercise where it is used, in case the reader has not done the previous exercise; 
in most cases it is not mentioned if previous theorems, lemmas and the like proved in 
the text are used because it is assumed that the reader has read the text (though in a 
few particularly tricky exercises a relevant result from the text is mentioned as a hint). 
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Many of the exercises are used in the text, and are labeled as such. There are 
two reasons why so many of the exercises are used in the text: to streamline some 
of the lengthier proofs by leaving manageable parts to the students, and to provide 
exercises that are actually useful, as opposed to exercises that exist simply for the 
sake of having exercises (though there are plenty of those too, to provide sufficient 
practice for the student). 

In some of the exercises the reader is asked to prove statements that have a number 
of similar cases, for example statements involving functions that are increasing or 
decreasing. For such exercises it is acceptable to do one case in detail, and to say that 
the other cases are similar and that the details are omitted, as long as you are sure that 
the details really are similar. 

Finally, we note that there is a large variation in the level of difficulty of the 
exercises, ranging from some that are very straightforward (being slight variants of 
proofs in the text), to some that are quite challenging, and with many in between. 
No attempt has been made to rate the difficulty of the exercises, such a rating being 
necessarily subjective. 


Writing Proofs 


Doing exercises is an important aspect of learning mathematics; writing the exercises 
carefully makes them even more effective. Advanced mathematics is not always 
easy, and everyone makes honest mathematical errors in the process of learning such 
material. There is no reason, however, to have avoidable mistakes due to carelessness 
and poor writing if sufficient care is taken. 

One of the most common, if not the most common, source of errors when students 
first encounter real analysis involves problems with quantifiers. A number of very 
important definitions in real analysis, for example the definitions of limits of functions, 
continuity, uniform continuity, and limits of sequences, involve two quantifiers in a 
given order. The need to prove statements that involve multiple quantifiers is what 
makes real analysis a bit harder for many undergraduates to grasp upon first encounter 
than linear algebra and abstract algebra, which also require proofs, but which do not 
have such complications with quantifiers. The best way to be careful with quantifiers 
is to write proofs very carefully and precisely. In particular, when proving a statement 
with quantifiers, it is crucial to deal with the quantifiers in the exact order in which 
they are given in the statement that is being proved. 

Another very common error for beginners in real analysis, which also does not 
occur as much in subjects such as linear algebra and abstract algebra, is the need to 
distinguish between one’s scratch work and the actual proof. A proof must always 
start with what we know, and deduce that which we want to prove. It is a common 
logical error to try to prove something by assuming the thing we are supposed to 
prove, and then working backwards until we arrive at something that we already know 
(for example that something equals itself). Such “backwards proofs” are, for reasons 
this author does not understand, extremely common in high school mathematics, 
though there the arguments are often sufficiently simple, and reversible, so that no 
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harm is done other than conveying the incorrect impression that “backwards proofs 
are actual proofs, which they are not. By contrast, in real analysis it is crucial not 
to confuse “backwards proofs” with real proofs. For example, as we will see very 
clearly in Example 3.2.3, the €-6 proofs discussed in Section 3.2 often require first 
some scratch work that is “backwards,” and then a rather different-looking proof that 
is “forwards.” Hence, it is very important in the proofs that you write for the exercises 
in this book that you distinguish between how you think of a proof, which can be any 
combination of “backwards,” “forwards” and anything else, and how you write the 
final draft of the proof, which must be very precise in going from what we assume to 
what we want to prove. 
A few additional points about writing mathematical proofs are the following: 


e Strategize the outline of a proof before working out the details; the outline of a 
proof is determined by what is being proved, not by what is hypothesized. 

e Use definitions precisely as stated. 

e Do not omit steps in proofs; when in doubt, prove it. 

e Justify each step in a proof, citing the appropriate results from the text as needed. 


e Ifastep ina proof is skipped, for example because it is very similar to a previous 
step, state explicitly that that is the case. 


e Use correct grammar, including full sentences and proper punctuation. 
e Use “=” signs properly. 


e Proofs should stand on their own; check your proofs by reading them as if they 
were written by someone else. 


See [Blo10, Section 2.6], [Gil87], [Hig98], [KLR89] and [SHSD73] for further 
discussion of writing mathematics. 

The bottom line is to write your proofs very carefully, because doing so will help 
you learn the material in this book. 


To the Instructor 


This text, which is designed for a one-semester course in single-variable real analysis, 
has more material than can typically be covered in one semester, in order to provide 
flexibility for the instructor. Moreover, this text has been designed to accommodate 
three different types of real analysis courses, each for a different target audience. 
Suggested course outlines for these three audiences are given below, though of course 
each instructor should be guided by her own choices more than by the author’s 
suggestions. 


Standard Introduction to Real Analysis 


This course is a traditional first course in real analysis for mathematics majors, and 
for other students (for example physics majors) who want to be prepared for advanced 
work in mathematics. This course covers all the typical single-variable topics, albeit 
with a slightly more thorough treatment of the properties of the real numbers than 
usual, and with sequences placed after differentiation and integration. 


Chapter 2, Properties of the Real Numbers: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6. 
Chapter 3, Limits and Continuity: 3.1, 3.2, 3.3, 3.4, 3.5. 

Chapter 4, Differentiation: 4.1, 4.2, 4.3, 4.4, 4.5. 

Chapter 5, Integration: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7. 

Chapter 8, Sequences: 8.1, 8.2, 8.3, 8.4. 

Chapter 9, Series: 9.1, 9.2, 9.3, 9.4, 9.5. 

Chapter 10, Sequences and Series of Functions: 10.1, 10.2, 10.3, 10.4, 10.5. 


Introduction to Real Analysis for Prospective Secondary School Teachers 


This course is for prospective secondary school teachers who have not previously 
studied real analysis, for example undergraduate mathematics education majors or 
M.A.T. students in mathematics who did not study real analysis as undergraduates, 
and who seek a good understanding of the material from calculus that they will be 
teaching. This course focuses on core topics such as properties of the real numbers 
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(important for any mathematics teacher separately from its role in calculus), limits, 
derivatives and integrals, and it also has extra topics of direct concern to future 
teachers of calculus such as exponential and logarithmic functions (at the expense of 
part of the treatment of series). 


Chapter 2, Properties of the Real Numbers: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6. 
Chapter 3, Limits and Continuity: 3.1, 3.2, 3.3, 3.4, 3.5. 

Chapter 4, Differentiation: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6. 

Chapter 5, Integration: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.9. 

Chapter 7, Transcendental Functions: 7.1, 7.2. 

Chapter 8, Sequences: 8.1, 8.2, 8.3, 8.4. 

Chapter 9, Series: 9.1, 9.2, 9.3, 9.4, 9.5. 


Second Course in Real Analysis for Prospective Secondary School Teachers 


This course is for prospective secondary school teachers, for example M.A.T. students 
in mathematics who already had an introductory course in real analysis as undergrad- 
uates, who seek a more thorough understanding of the material from calculus that 
they will be teaching. In addition to a review of core topics such as limits, derivatives 
and integrals, this course includes a detailed look at topics of direct concern to future 
teachers of calculus but not usually treated fully in many introductory real analysis 
courses, for example the construction of the real numbers, transcendental functions, 
improper integrals, area, arc length and z. 


Chapter 1, Construction of the Real Numbers: 1.1, 1.2, 1.3 (or 1.4 instead of the pre- 
vious two sections), 1.5, 1.6, 1.7. 

Chapter 2, Properties of the Real Numbers: 2.1, 2.3, 2.5, 2.6, 2.7, 2.8. 

Chapter 3, Limits and Continuity: Review as needed. 

Chapter 4, Differentiation: Review as needed, 4.6. 

Chapter 5, Integration: Review as needed, 5.8, 5.9. 

Chapter 6, Limits to Infinity: 6.1, 6.2, 6.3, 6.4. 

Chapter 7, Transcendental Functions: 7.1, 7.2, 7.3, 7.4. 

Chapter 8, Sequences: Review as needed, 8.3, 8.4. 

Chapter 9, Series: Review as needed, 9.4, 9.5. 

Chapter 10, Sequences and Series of Functions: 10.1, 10.2, 10.3, 10.4, 10.5. 


1 


Construction of the Real Numbers 


1.1 Introduction 


Real analysis—which in its most basic form is the rigorous study of the ideas in 
calculus—takes place in the context of the real numbers, because the real numbers 
have the properties needed to allow things such as derivatives and integrals to work 
as we would like them to. A rigorous study of derivatives and integrals requires a 
rigorous treatment of the fundamental properties of the real numbers, and that is the 
topic of this chapter and the next. 

Inside the set of real numbers (which intuitively form the complete “number line”) 
there are three familiar sets of numbers: the natural numbers (intuitively 1,2,3,...), the 
integers (intuitively ...,—2,—1,0,1,2,...) and the rational numbers (the fractions). 
We use the standard symbols N, Z, Q and R to denote the natural numbers, the 
integers, the rational numbers and the real numbers, respectively. These sets are 
subsets of one another in the order N C Z C Q CR, where each set is a proper subset 
of the next. 

This text offers three ways to enter into the study of the real numbers. 

Entry 1, which starts in Section 1.2 in the present chapter, and which offers the 
most complete treatment, begins with axioms for the natural numbers, and then leads 
to constructions of the integers, the rational numbers and the real numbers, proving 
the main properties of each set of numbers along the way. 

Entry 2, which starts in Section 1.4 in the present chapter, skips over the axiomatic 
treatment of the natural numbers, and begins instead with an axiomatic treatment of 
the integers. It is shown that inside the integers sits a copy of the natural numbers. 
After that, the rational numbers and the real numbers are constructed, and their main 
properties proved. This approach is a bit shorter and simpler than that of Entry 1, 
though still more detailed than that of Entry 3. 

Entry 3, which starts in Section 2.2 in the next chapter, commences with an 
axiomatic treatment of the real numbers. This approach, which is the one taken in 
most introductory real analysis books, is the most efficient route to the core topics 
of real analysis, but it gives the least insight into the number systems. In Section 2.4 
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it is shown that inside the real numbers sit the natural numbers, the integers and the 
rational numbers, with all their expected properties. 

All three entries lead to proofs of the same properties of the real numbers, and 
as such all three are reasonable starting points in the study of real analysis. The 
reader should now proceed to the entry of her choice, after first reading the following 
definition, which is needed to define concepts such as addition, multiplication and 
negation of real numbers. 


Definition 1.1.1. Let S be a set. A binary operation on S is a function S x § > S.A 
unary operation on S is a function S$ — S. A 


Let S be a set, and let *: S x S — S be a binary operation. If x,y € S, the correct 
way to write the result of doing the operation * to the pair (x,y) would be «((x,y)). 
However, because such notation is both quite cumbersome to write, and does not 
resemble the way we write familiar binary operations such as addition of numbers, 
we will write x * y instead of «((x,y)). Similarly, if a: S — S is a unary operation, 
and if x € S, the correct way to write the result of doing the operation — to x would be 
a(x), but we write the more familiar —.7. 

Additionally, let T C S be a subset. We say T is “closed” under the binary operation 
* if x* y € T for all x,y € T. A similar definition holds for a subset being closed 
under a unary operation. We note that this use of the term “closed” will be employed 
only informally, and only occasionally, in contrast to the very important use of this 
same word in Definition 2.3.6, where closed intervals are defined. (A much more 
general, and very important, use of the word “closed” can be found in any introductory 
topology text, for example [Mun00].) 


1.2 Entry 1: Axioms for the Natural Numbers 


The simplest, and most fundamental, set of numbers is the set of natural numbers, 
that is, the numbers 1,2,3,4,.... In this section we will give an axiomatic treatment 
of these numbers, and in subsequent sections we will construct all the other familiar 
sets of numbers (integers, rational numbers, real numbers) in terms of the natural 
numbers. 

A good axiomatic system is one that assumes as little as possible, and from which 
as much as possible can be proved. To make an efficient axiomatization for the natural 
numbers, we need to strip these numbers down to their bare essentials. Intuitively, 
we know various things about the natural numbers, such as the existence and basic 
properties of the binary operations addition and multiplication, and the relation less 
than. How few of these notions can we take as axioms, from which we can deduce 
everything else that we need to know about the natural numbers? It turns out that 
very little is needed for an axiomatization of the natural numbers—neither addition 
nor multiplication, nor the relation less than (they will all be constructed from our 
axioms). 

The standard axiomatization of the natural numbers, known as the Peano Postu- 
lates, is based upon the notion of proof by induction. We assume that the reader is 
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familiar with proof by induction, at least informally. We will review the practical use 
of such proofs in Section 2.5; at present we need proof by induction for theoretical 
purposes. 

In its most bare-bones form, the natural numbers will consist of a set (denoted 
N), a distinguished element (denoted 1) and a unary operation on the set (denoted 
s: N—N). Intuitively, the function s takes each natural number to its successor, 
which we would normally think of as being the result of adding | to each natural 
number, though that cannot be formally stated quite yet, because we do not have 
the notion of addition in our axioms for the natural numbers. The Peano Postulates 
require that three entities N, 1 and s satisfy a few simple properties. One of these 
properties, listed as Part (c) of Axiom 1.2.1 below, is just the formal statement that 
proof by induction works. 

It is rather surprising, upon first encounter, that we can get away with assuming 
so little about the natural numbers. For example, we make no axiomatic assumption 
about addition and multiplication; these operations will be constructed using the 
Peano Postulates. As the reader will see from the details of our development of the 
various number systems, the Peano Postulates are incredibly powerful. 


Axiom 1.2.1 (Peano Postulates). There exists a set N with an element 1 € N and a 
function s: N — N that satisfy the following three properties. 


a. There isnon € N such that s(n) = 1. 

b. The function s is injective. 

c. LetG CN bea set. Suppose that 1 € G, and that if g € G then s(g) € G. Then 
G=N. 


Observe that it does not say in the Peano Postulates (Axiom 1.2.1) that the set N 
is unique, though in fact that turns out to be true; see Exercise 1.2.8 for details. We 
can therefore make the following definition. 


Definition 1.2.2. The set of natural numbers, denoted N, is the set the existence of 
which is given in the Peano Postulates. A 


Part (a) of the Peano Postulates says, intuitively, that | is the “first” number in 
N. Parts (a) and (b) together are needed to ensure that (N,1,s) is infinite. To see 
why, let M = {1,p}, and let s: M — M be defined by s(1) = p and s(p) = p. It is 
straightforward to see that Parts (a) and (c) of the postulates hold for this M, 1 and 
s, even though M is not what we would intuitively want to call the set of natural 
numbers; of course, this function s does not satisfy Part (b) of the Peano Postulates. 
Using the same set M but with s(1) = p and s(p) = 1, shows that a finite set may 
satisfy Parts (b) and (c) of the postulates, but not Part (a). Hence, to ensure that a 
set satisfying the Peano Postulates truly models the natural numbers, we need both 
Parts (a) and (b) of the postulates (or something like them). That we need something 
like Part (c) of the postulates seems reasonable, because we will need to use proof by 
induction in a number of our proofs about the natural numbers. 

We cannot prove that N is precisely what our intuition tells us it should be, because 
we cannot prove things about our intuition. The best we can do, and we will indeed do 
this, is to prove that N satisfies all the basic properties we can think of for the natural 
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numbers. Formally, we simply define the natural numbers to be the set N given in the 
Peano Postulates. 

Our first result about the natural numbers is the following simple lemma, which 
certainly fits in with our intuitive sense that for every natural number other than 1, 
there is another natural number that precedes it. The proof of the following lemma is 
a typical use of the Peano Postulates. 


Lemma 1.2.3. Let a € N. Suppose that a F 1. Then there is a unique b € N such that 
a=s(b). 


Proof. We start with uniqueness. Suppose that there are n,m € N such that a = s(n) 
and a = s(m). Then s(n) = s(m). By Part (b) of the Peano Postulates we know that s 
is injective, and therefore n = m. 

To prove existence, let 


G={1}U{ce€N]| there is some b € N such that s(b) = c}. 


We will use Part (c) of the Peano Postulates to prove that G = N, which will imme- 
diately imply the existence part of this lemma. It is clear that G C N and that 1 € G. 
Now let n € G. We need to show that s(n) € G. Let p = s(n). To show that p € G, we 
will show that p € {c € N | there is some b € N such that s(b) = c}. Let b =n. Then 
s(b) = p by the definition of p. It follows that p € G, and therefore s(n) € G. Hence 
G=N. 


We now want to define the binary operations addition and multiplication for the 
natural numbers, using only the Peano Postulates, and results derived from these 
postulates (see Section 1.1 for the definition of binary operations). However, before 
we can define these binary operations, which are given in Theorem 1.2.5 and Theo- 
rem 1.2.6 below, we need to prove the following theorem, which provides the main 
tool in our definitions of addition and multiplication. This theorem, called Definition 
by Recursion, allows us to define a function with domain N by defining the function 
at 1, and then defining it at 7 + | in terms of the definition of the function at n. (See 
Section 2.5 for further discussion of the practical use of Definition by Recursion, as 
well as some examples.) It is important to recognize that recursion, while intimately 
related to induction, is not the same as induction (though it is sometimes mistakenly 
thought to be); the essential difference is that induction is used to prove statements 
about things that are already defined, whereas recursion is used to define things. 

The proof of the following theorem is our most difficult proof involving the 
natural numbers, and to avoid interrupting the flow of the present section, this proof 
is given in Section 2.5 (where the theorem is restated as Theorem 2.5.5). However, 
because the proof of this theorem relies upon nothing other than the Peano Postulates 
(Axiom 1.2.1), and because we will need this theorem very soon to define addition 
and multiplication on the natural numbers, it is important that the reader not skip over 
the statement of this theorem; the reader who wishes to read the proof now can safely 
skip ahead and read the proof of Theorem 2.5.5, and then return to this point. 
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Theorem 1.2.4 (Definition by Recursion). Let H be a set, lete € H and letk: H — 
H be a function. Then there is a unique function f : N — H such that f(1) =e, and 
that fos =kof. 


The equation fos =ko f inthe statement of Theorem 1.2.4 means that f(s(m)) = 
k(f(n)) for all n € N. If s(n) were to be interpreted as n+ 1 (as indeed it will be in 
Theorem 1.2.5), then f(s(n)) = k(f(n)) would mean that f(n+ 1) = k(f(n)), which 
looks more like the familiar form of definition by recursion. Additionally, the equation 
fos =ko f can be expressed by saying that the following diagram is “commutative,” 
which means that going either way around the square yields the same result. We 
will not be making use of commutative diagrams in this text, but in some parts of 
mathematics they are very useful. 


f 


H ———> H 


k 
Now that we have Definition by Recursion available to us, we are ready to define 
addition on N, as given in the following theorem. Though it might not be evident at 
first why we choose the two properties of addition listed in this theorem rather than 
other, more common, properties of addition, with hindsight they turn out to work well, 
allowing for nice proofs of other properties of addition. 


Theorem 1.2.5. There is a unique binary operation +: N x N — N that satisfies the 
following two properties for alln,m € N. 

a. n+1=sj(n). 

b. n+s(m) = s(n+m). 


Proof. To prove uniqueness, suppose that there are two binary operations + and @ 
on N that satisfy the two properties of the theorem. Let 


G= {xe N|n+x=n@x foralln€ N}. 


We will prove that G = N, which will imply that + and © are the same binary 
operation, which is what we need to show for uniqueness. It is clear that G C N. By 
Part (a) applied to each of + and @ we see thatn+ 1 = s(n) =n@ 1 for alln EN, 
and hence | € G. Now let g € G. Letn € N. Thenn+gq=n@q by hypothesis on q. 
It then follows from Part (b) that n+.s(q) = s(n+q) = s(n®@q) =n@s(q). Hence 
s(q) € G. We now use Part (c) of the Peano Postulates to conclude that G = N. 

For existence, we start by observing that for p € N, we can apply Theorem 1.2.4 
to the set N, the element s(p) € N and the function s: N — N, to deduce that there 
is a unique function f,: N — N such that f,(1) =s(p) and f,os=so0 fy. Let 
+:NxN-—N be defined by c+d = f,(d) for all (c,d) € N x N. Letn,m € N. Then 
n+1= fn(1) = s(n), which is Part (a), and n+ s(m) = fn(s(m)) = (fros)(m) = 
(so fn)(m) = s(fux(m)) = s(n +m), which is Part (b). 
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Part (a) of Theorem 1.2.5 says that the function s works exactly as we had initially 
thought of it intuitively. From now on, we will often write n+ 1 in places where we 
would have written s(7). 

Using addition, we now turn to the definition of multiplication. 


Theorem 1.2.6. There is a unique binary operation -: N x N > N that satisfies the 
following two properties for alln,m € N. 


a. n-l=n. 
b. n-s(m) = (n-m) +n. 


Proof. We leave uniqueness to the reader in Exercise 1.2.1. 

Let qg € N. Let hg: N — N be defined by hg(m) = m+ q for all m € N. Applying 
Theorem 1.2.4 to the set N, the element g € N and the function h,: N — N, implies 
that there is a unique function g,: N — N such that g,(1) = q and g,05s=h,og,. 
Let -: Nx N — N be defined by c-d = g,(d) for all (c,d) € N x N. The proof that 
the two properties of the theorem hold is left to the reader in Exercise 1.2.1. 


We will, as usual, write “nm” instead of “n-m,’ except in cases of potential 
ambiguity (for example, we will write “1-1” rather than “11”), or where the - makes 
the expression easier to read. 

The following theorem gives some of the very familiar properties of addition and 
multiplication of natural numbers. The main technique of proof for these properties is 
Part (c) of the Peano Postulates. The different parts of the theorem have been arranged 
so that to prove each, it is permissible to use everything stated previously, but not 
subsequently. This same strategy of using only previously stated results will hold 
throughout this text, except when otherwise noted. 


Theorem 1.2.7. Leta,b,c EN. 


. Ifat+c=b+e, thena=b_ (Cancellation Law for Addition). 
. (a+b)+c=a+(b+c) (Associative Law for Addition). 
l+a=s(a)=a+t1. 

a+b=b+a_ (Commutative Law for Addition). 

a+bFl. 

a+b#a. 

a-l1=a=1-a_ (Identity Law for Multiplication). 

. (a+b)c=ac+be (Distributive Law). 

. ab=ba (Commutative Law for Multiplication). 

10. c(a+b)=ca+cb (Distributive Law). 

11. (ab)c=a(bc) (Associative Law for Multiplication). 

12. Ifac=be thena=b_ (Cancellation Law for Multiplication). 
13. ab=1 ifand only ifa=1=b. 


SNAWAWNE 


‘s 


Proof. We will prove Parts (1), (5), (6) and (12), leaving the rest to the reader in 
Exercise 1.2.2. 


1.2 Entry 1: Axioms for the Natural Numbers aq 
(1) Let 
G={zEN|ifx,yeNandx+z=y+z, then x=y}. 


We will show that G = N, which will imply the desired result. Clearly G C N. To 
show that 1 € G, let j,k © N and suppose that j + 1 =k-+ 1. Then s(j) = s(k) by 
Theorem 1.2.5 (a), and so j =k by the injectivity of s (Part (b) of the Peano Postulates). 
Hence | € G. Now let r € G. Further, let j,k € N, and suppose that j+s(r) =k+5s(r). 
By Theorem 1.2.5 (b) we deduce that s(j-+r) =s(k+r). Hence j-+r=k-+r by the 
injectivity of s. Because r € G we deduce that j =k. Therefore j+ s(r) =k+5(r) 
implies j = k, and it follows that s(r) € G. We deduce that G = N by Part (c) of the 
Peano Postulates. 


(5) Suppose that a+ b = 1; we will derive a contradiction. There are two cases. 
First, suppose that b = 1. Then 1 = a+b =a+1 = (a), which is a contradiction 
to Part (a) of the Peano Postulates. Now suppose that b 4 1. By Lemma 1.2.3 there 
is some x € N such that s(x) = b. By Theorem 1.2.5 (b) we then have 1 =a+b= 
a+s(x) =s(a+x), again a contradiction to Part (a) of the Peano Postulates. 


(6) Let 


H={zeEN|ifyeNthenz+y#z}. 


We will show that H = N. Clearly H CN. By Part (5) of this theorem we know that 
1+k #1 for all k €N, and it follows that 1 € H. Now let r € H. Suppose further 
that there is some k € N such that s(r) +k = s(r). By Part (4) of this theorem we see 
that k+s(r) =s(r), and then by Theorem 1.2.5 (b) we deduce that s(k+r) = s(r). 
Because s is injective (Part (b) of the Peano Postulates), it follows that k-+r =r. Using 
Part (4) of this theorem again we deduce that r+ k = r, which is a contradiction to 
the fact that r € H. Hence there is no k € N such that s(r) +k = s(r), and we deduce 
that s(r) € H. Hence H =N. 


(12) Let 
H ={x EN |ify,z © N and xz = yz, thenx =y}. 


Let j,k € N, and suppose that 1-& = jk. Suppose further that 7 A 1. By Lemma 1.2.3 
there is some t € N such that j = s(t). By Parts (4), (7) and (9) of this theorem and 
Theorem 1.2.6 (b), we see that k =1-k = jk =kj =ks(t) =kt +k =k +kt, which is 
a contradiction to Part (6). Therefore j = 1, and hence | € H. 

Now let r € H. Let j,k € N, and suppose that s(r)k = jk. As before, by previous 
parts of this theorem and Theorem 1.2.6 (b), we deduce that kr-+k = jk. If j = 1 then 
by previous parts of this theorem we see that k + kr = k, which is a contradiction to 
Part (6) of this theorem. Hence j 4 1. By Lemma 1.2.3 there is some p € N such that 
j =5(p). Therefore kr +k = s(p)k, and using the same ideas as before we see that 
kr+k=kp-+k. By previous parts of this theorem it follows that rk = pk. Because 
r €H, it follows that r = p. Therefore s(r) = s(p) = j. Hence s(r) € H, and we 
deduce that H = N. 
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Observe that Theorem 1.2.7 (3) states that the function s is just what we intuitively 
thought it would be. From now on we will write a+ 1 instead of s(a) fora € N. 

Addition and multiplication are the two most important binary operations on the 
natural numbers. The most important relations on the natural numbers are less than 
and less than or equal to, to which we now turn. 


Definition 1.2.8. The relation < on N is defined by a < b if and only if there is some 
p € N such that a+ p = b, for all a,b € N. The relation < on N is defined by a < bif 
and only ifa< bora=b, foralla,bEN. A 


We will write a > b to mean the same thing as b < a, and similarly for a > b. 

As expected, if a,b € N and a < bd, then the element p € N such thata+ p=b 
is unique; the proof of this fact is left to the reader in Exercise 1.2.3. The following 
theorem gives some of the basic properties of < and <. 


Theorem 1.2.9. Let a,b,c,d €N. 


1, a<a,anda<aanda<a+l. 

2, 1<a. 

3. Ifa<bandb <c, thena<c; ifa<bandb<c, thena<c; ifa<band 
b<c, thena<c; ifa<bandb<c, thena<c. 

. a<bifand only ifat+c<b+e. 

. a<bifand only if ac < be. 

. Precisely one ofa<bora=bora>bholds  (Trichotomy Law). 
a<borb<a. 

. Ifa<bandb <a, thna=b. 

. It cannot be thatb<a<b+l. 

10. a<bifand only ifa<b+1. 

Il. a<bifand only ifa+1<b. 


SS ONAMA 


Proof. We will prove Parts (2), (6), (7), (8), (9) and (10), leaving the rest to the reader 
in Exercise 1.2.4. 


(2) There are two cases. First, suppose that a = 1. Then certainly 1 < a. Second, 
suppose that a 1. By Lemma 1.2.3 there is some p € N such that a = p+ 1. Hence 
a= 1+p, which means that | < a, and therefore | < a. 


(6) We first show that no two of a < band a=b anda > bcan hold simultane- 
ously. Suppose that a < b and a = b. It then follows from Part (3) of this theorem that 
a <a, which is a contradiction to Part (1) of this theorem. A similar argument shows 
that it is not the case that a > b and a = b. Now suppose that a < b and b < a. By 
Part (3) of this theorem we deduce that a < a, again a contradiction. 

We now show that at least one of a< b ora=bora>b holds. Let 


G={xeEN|ifyEN, thenx <yorx=yorx>y}. 


We will show that G =N, and the desired result will follow. We start by showing 
that 1 € G. Let 7 € N. By Part (2) of this theorem we know that 1 < j. It follows that 
1 =j or 1 <j. Hence 1 € G. Now suppose that k € G; we will show thatk+1€G. 
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Let j € N. By hypothesis on k we know that k < j ork = j ork > j. First suppose 
that k < j. Then there is some p € N such thatk+ p = j. If p=1, thenk+1=/j; 
if p £1, then by Lemma 1.2.3 and Theorem 1.2.5 (a) there is some r € N such that 
p=r-+1, which implies that k+ (r+1) = j, which by Theorem 1.2.7 (2) (4) implies 
(k+1)+r= j, and from that we deduce that k+ 1 < j. Next suppose that k = j. Then 
by Part (1) of this theorem it follows that k+ 1 > k= j. Finally, suppose that k > j. 
Once again we know that k+ 1 > k, and by Part (3) of this theorem it follows that 
k+1> j. Putting all three cases together, we see that one of K+ 1 < jork+1=j or 
k+1> j always holds. Hence k+ 1 € G, and by Part (c) of the Peano Postulates we 
conclude that G = N. 


(7) & (8) These follow directly from Part (6). 


(9) Suppose that b < a < b+ 1. Then there are g,h € N such that b+ g =a and 
thata+h=b-+1. Then (b+ g)+h=b+1. By Theorem 1.2.7 (2) (4) we see that 
(g+h)+b=1+b, and by Theorem 1.2.7 (1) we then conclude that g+ = 1. This 
last statement contradicts Theorem 1.2.7 (5). 


(10) First suppose that a < b. Suppose further that a > b+ 1. Then by Part (3) of 
this theorem we deduce that b+ 1 < b, which is a contradiction to Parts (1) and (6) of 
this theorem. Second, suppose that a < b+ 1. Suppose further that a > b. It follows 
that b <a<b+1, whichis a contradiction to Part (9) of this theorem. 


Part (9) of Theorem 1.2.9 says that the natural numbers are “discrete,” a feature 
not shared by the rational numbers or the real numbers. The integers are also discrete 
in the sense of Theorem 1.2.9 (9), though there is a prominent difference between the 
natural numbers and the integers, which is, intuitively, that the latter “goes to infinity” 
in two directions, whereas the former does so in only one direction. This last property 
of the natural numbers, combined with the notion of discreteness, form the intuitive 
basis for the following theorem, which says that every non-empty subset of N has a 
smallest element (though not necessarily a largest one). 


Theorem 1.2.10 (Well-Ordering Principle). Let G C N be a non-empty set. Then 
there is some m € G such that m < g for all g € G. 


Proof. Suppose that there is no m € G such that m < g for all g € G. We will derive 
a contradiction. Let 


H={aeN|ifne Nandn <a, thenn ¢ G}. 


It follows from the definition of H that HG = 9. We will show that H = N, and it 
will then follow that G is empty, the desired contradiction. 

Suppose that 1 ¢ H. Then there is some q € N such that g < 1 and g € G. By 
Theorem 1.2.9 (2) (8) we see that g = 1. Hence 1 € G. It then follows from Theo- 
rem 1.2.9 (2) that G has an element, namely, the number m = 1, such that m < g 
for all g € G, which is a contradiction to our hypothesis that no such element exists. 
Therefore | € H. 
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Now suppose that a € H. Suppose further that a+ 1 ¢ H. Then there is some 
p © N such that p < a+ 1 and p € G. If it were the case that p < a, then we would 
have a contradiction to the fact that a € H. Hence, by Theorem 1.2.9 (6) we deduce 
that a < p. Therefore a < p <a-+1, and it follows from Part (9) of the same theorem 
that p = a+1.Hencea+1€G. Now let x € G. Suppose that x <a+1.Thenx <a 
by Theorem 1.2.9 (10). Because a € H it follows that x ¢ G, which is a contradiction. 
Hence a+ 1 <x by Theorem 1.2.9 (6). We now have a contradiction to the fact that no 
element such as a+ | exists in G. It follows that a+ 1 € H, and hence that H = N. 


Reflections 


We have taken the Peano Postulates for the natural numbers as axiomatic, but in 
fact it is possible to derive the existence of a system satisfying the Peano Postulates 
from the Zermelo—Fraenkel axioms for set theory; see [Vau95, Chapters 2—3] or 
[Mor87, Chapter 5] for details. Of course, one has to start somewhere axiomatically, 
and from the perspective of real analysis it makes sense to start with the Peano 
Postulates, which describe a simple and familiar set of numbers, rather than with the 
much less intuitive Zermelo—Fraenkel axioms for set theory, which would also take 
us rather far afield from calculus. 

The material in this section looks a bit easier than it really is, because we omitted 
the proof of Definition by Recursion (Theorem 1.2.4), which is trickier and lengthier 
than any other proof in the section. The proof of Definition by Recursion is to be 
found in Section 2.5, where it is seen by all readers of the text, no matter which 
entry they used. However, although the proof of Definition by Recursion was omitted 
from the present section for organizational purposes, and also to allow for a smoother 
development of the natural numbers, it is important to stress the great importance of 
Definition by Recursion—it is needed to define addition and multiplication for the 
natural numbers; the reader who wants to see all the details of a rigorous treatment of 
the natural numbers right now should jump ahead and read the proof of Definition by 
Recursion before proceeding to the next section. 


Exercises 


Exercise 1.2.1. [Used in Theorem 1.2.6.] Fill in the missing details in the proof of 
Theorem 1.2.6. 


Exercise 1.2.2. [Used in Theorem 1.2.7.] Prove Theorem 1.2.7 (2) (3) (4) (7) (8) (9) 
(10) (11) (13). 


Exercise 1.2.3. [Used in Section 1.2.] Let a,b € N. Suppose that a < b. Prove that 
there is a unique p € N such thata+p=b. 


Exercise 1.2.4. [Used in Theorem 1.2.9.] Prove Theorem 1.2.9 (1) (3) (4) (5) (11). 


Exercise 1.2.5. [Used in Exercise 1.3.3.] Let a,b € N. Prove that ifa+a=b+b, then 
a=b: 
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Exercise 1.2.6. Let b € N. Prove that 


{nEN|1<n<b}U{nEeN|b+1<ns=N 
{nEN|l<n<b}n{neN|b4+1<nh=0. 


Exercise 1.2.7. Let A C N be a set. The set A is closed if a € A implies a+1 €A. 
Suppose that A is closed. 


(1) Prove that ifa <A andn EN, thena+ne€A. 
(2) Prove that ifa € A, then {x EN |x >a} CA. 


Exercise 1.2.8. [Used in Section 1.2.] Suppose that the set N together with the element 
1 € Nand the function s: N —N, and that the set N’ together with the element 1’ € N’ 
and the function s’: N’ — N’, both satisfy the Peano Postulates. Prove that there is a 
bijective function f: N — N’ such that f(1) = 1’ and fos =s'0 f. The existence of 
such a bijective function proves that the natural numbers are essentially unique. 

The existence of the function f follows immediately from the existence part of 
Theorem 1.2.4; the trickier aspect of this exercise is to prove that f is bijective. To 
do that, find an inverse for f by using the existence part of Theorem 1.2.4 again, and 
then prove that the function you found is an inverse of f by using the uniqueness part 
of Theorem 1.2.4. 
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The natural numbers have many nice properties, but there are some things obviously 
missing from them. There is nothing in the natural numbers that plays the role of the 
number zero (Theorem 1.2.7 (6) rules out any such number), and there is nothing that 
plays the role of negative numbers (the definition of less than for the natural numbers 
means that if any two natural numbers are added, the result is a number greater than 
each of the original two numbers). We could simply try to adjoin zero and the negative 
numbers to the set of natural numbers by brute force, but doing so would leave us 
unsure that we are on safe ground, unless we assume additional axiomatic properties, 
which we would prefer not to do. Instead, we will construct a new set of numbers, 
the integers, out of the set of natural numbers, and we will show that this new set 
contains a copy of the natural numbers together with zero and the negatives of the 
natural numbers, and that this new set of numbers obeys all the expected properties. 
The construction and proofs in this section are quite different from those seen in 
Section 1.2. For the sake of brevity, we will use the results of Section 1.2 without 
always giving explicit references to theorems and lemmas whenever we use standard 
properties of the natural numbers (for example, the fact that a+b =b-+a for all 
a,b € N). A crucial tool needed in the construction of the integers, which we did not 
use in Section 1.2, is the concept of equivalence relations and equivalence classes. 
The intuitive idea in our construction is that we can think of an integer as given 
by an expression of the form “a—b,” where a,b € N. Because we do not have the 
operation subtraction on N, we replace “a— b” with the pair (a,b). It could happen, 
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however, that “a — b” equals “cc — d” for some a,b,c,d € N, wherea#Ac andb#d. 
Then both pairs (a,b) and (c,d) ought to represent the same integer. To take care of 
this problem we define the following relation on N x N. 


Definition 1.3.1. The relation ~ on N x N is defined by (a,b) ~ (c,d) if and only if 
a+d=b+-c, forall (a,b), (c,d) ENxN. A 


Lemma 1.3.2. The relation ~ is an equivalence relation on N x N. 


Proof. We will prove reflexivity and symmetry, leaving transitivity to the reader 
in Exercise 1.3.2. Let (a,b),(c,d) € N x N. We note that a+b = b +a, and hence 
(a,b) ~ (a,b). Therefore ~ is reflexive. Now suppose that (a,b) ~ (c,d). Then 
a+d=b+c. Hence c+b=d +a, and therefore (c,d) ~ (a,b). It follows that ~ is 
symmetric. 


We are now ready for the definition of the set of integers, together with addition, 
multiplication, negative and the relations less than, and less than or equal to. We will 
use the standard symbols +, -, —, < and < in the definition, though we need to be 
careful to note that these symbols formally mean different things when used with 
the integers from when they are used with the natural numbers. (Moreover, it is not 
possible to “add” a natural number and an integer using addition as we have defined 
it; this problem will be resolved when we see that a copy of the natural numbers sits 
inside the set of integers.) Recall the definition of binary and unary operations given 
in Section 1.1. 


Definition 1.3.3. The set of integers, denoted Z, is the set of equivalence classes of 
N x N with respect to the equivalence relation ~. 

The elements 0,1 € Z are defined by 0 = [(1,1)] and i = [(1 +1, 1)]. The binary 
operations + and - on Z are defined by 


[(a,5)] + [(c,d)] = [(a+c,b+4d)] 
[(a,b)] - [(c,d)] = [(ac + bd, ad + bc)] 


for all [(a,b)],[(c,d)] € Z. The unary operation — on Z is defined by —[|(a,b)] = 
[(b,a)] for all [(a,b)] € Z. The relation < on Z is defined by [(a,b)] < [(c,d)] if 
and only if a+d <b+c, for all {(a,b)],[(c,d)] € Z. The relation < on Z is de- 
fined by [(a,b)] < [(c,d)] if and only if [(a,b)] < [(c,d)] or [(a,b)] = [(c,d)], for all 
(a,6)], [(c,d)] € Z. A 


As is often the case with definitions involving equivalence classes, we need to 
check whether the binary operations + and -, the unary operation —, and the relation 
< are well-defined. For example, we defined [(a,b)] + [(c,d)] to be [(a+c,b+d)], 
where [(a,b)],[(c,d)] € Z, but for this definition to make sense, we need to verify 
that if [(a,b)] = [(x,y)] and [(c,d)] = [(z,w)], then [(a+c,b+d)] = [(x+z,y+w)]. 
In other words, we need to verify that the sum of the equivalence classes [(a,b)| 
and [(c,d)] depends only upon the equivalence classes themselves, and not upon the 
particular elements that are used to represent the equivalence classes. As seen in the 
following lemma, everything works out well. (We do not need to deal with the relation 
< in the lemma, because that is defined in terms of <.) 
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Lemma 1.3.4. The binary operations + and -, the unary operation —, and the relation 
<, all on Z, are well-defined. 


Proof. We will show that + and < are well-defined; the other parts of the lemma are 
left to the reader in Exercise 1.3.3. 

Let (a,b), (c,d), (x,y), (z,w) € N x N. Suppose that [(a, b)] = [(x,y)] and [(c,d)] = 
[(z,w)]. 

By hypothesis we know that (a,b) ~ (x,y) and (c,d) ~ (z,w). Henceat+y=b+x 
and c-++w =d +z. By adding these two equations and doing some rearranging we 
obtain (a+c)+(y+w) = (b+d)+(x+z), and we deduce that [(a+c,b+d)| = 
[((x-+z,y +w)]. Therefore + is well-defined. 

Now suppose that [(a,b)] < [(b,d)]. Thereforea+d < b+c. Adding b+x =a+y 
and c+ w=d-+z to this inequality, we obtaina+d+b+x+c+w<b+c+a+ 
y+d-+z. Canceling yields x+ w < y+z, and it follows that [(x,y)] < [(y,w)]. This 
process can be done backwards, and hence [(x,y)] < [(y,w)] implies [(a,b)] < [(b,d)]. 
Therefore [(a,b)| < [(b,d)] if and only if [(x,y)] < [(y,w)], which means that < is 
well-defined. 


We now turn to the basic algebraic properties of the integers. The idea of the proof 
of the following theorem is to rephrase things in terms of natural numbers, and then 
use the appropriate facts we have already proved about the natural numbers. As is 
usual, we will write “xy” instead of “x- y,” except in cases of potential ambiguity, or 
for ease of reading. We will write x > y to mean the same thing as y < x, and similarly 
forx>y. 


Theorem 1.3.5. Let x,y,z € Z. 


I. (x+y)+z=x+(y+2z) (Associative Law for Addition). 
2. x+y=y+x (Commutative Law for Addition). 

3. x+0=.x (Identity Law for Addition). 

4. x+(—x)=0  (Inverses Law for Addition). 

5. (xy)z=x(yz) (Associative Law for Multiplication). 

6. 

7 

8. 


. xy=yx (Commutative Law for Multiplication). 

. x-1=x (Identity Law for Multiplication). 

. X(y +z) =xy+xz (Distributive Law). 
9. If xy =0, thenx=Oory=0 (No Zero Divisors Law). 
10. Precisely one ofx <yorx=yorx>yholds  (Trichotomy Law). 
U1. Ifx<yandy<z thenx<z_ (Transitive Law). 
12. Ifx <ythenx+z<y+z (Addition Law for Order). 
13. Ifx <yandz>0, thenxz<yz (Multiplication Law for Order). 
14.041 (Non-Triviality). 


Proof. We will prove Parts (2), (9) and (12), leaving the rest to the reader in Exer- 
cise 1.3.5. Suppose that x = [(a,b)], that y = [(c,d)] and that z = [(e, f)], for some 
a,b,c,d,e,f EN. 
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(2) Using the definition of addition of integers, we see that x+y = [(a,b)] + 
[(c,d)] = [(a+c,b+d)] = |(c+a,d+b)] = [(c,d)|+ [(a,b)] =» +x, where the mid- 
dle equality holds by Theorem 1.2.7 (4). 


(9) Suppose that xy = 0 and that x 4 0. We will deduce that y = 0. Using the 
definition of multiplication of integers, we see that xy = [(ac + bd,ad + bc)]. It then 
follows from Exercise 1.3.4 (1) thata 4 b and ac+bd = ad +bc. By Theorem 1.2.9 (6) 
we know that either a < b or a > D. First, suppose that a < b. Then there is some 
g © N such that a+ g = b. Hence ac + (a+ g)d = ad+(a+g)c. By rearranging and 
canceling we deduce that d = c. Exercise 1.3.4 (1) then implies that y = 0. The case 
where a > b is similar to the previous case, and we omit the details. 


(12) Suppose that x < y. Then [(a,b)] < [(c,d)], and therefore a+d <b+c. 
Hence (a+d)+(e+f) < (b+c)+(e+f), and therefore (a+e)+(d+f) <(b+f)+ 
(c+e). It follows that [((a+e,b+ f)] < [(c+e,d+ f)], and hence [(a,b)] +[(e,f)] < 
[(c,d)] + [(e, f)], which means that x +z < y+z. 


We now have two sets of numbers, namely, the natural numbers (as defined in 
Section 1.2) and the integers (as constructed in the present section). As we defined 
these two sets, they are entirely disjoint. Of course, intuitively we think of the natural 
numbers as sitting inside the integers, and we now show that although formally these 
two sets of numbers are distinct, a copy of the natural numbers sits inside the integers, 
and this copy is just as good as the original. To find this copy of the natural numbers, 
we start with the following very simple definition. 


Definition 1.3.6. Let x € Z. The number x is positive if x > 6, and the number x is 
negative if x < 0. A 


As we now make precise in the following theorem, the set of positive integers 
can be viewed as a copy of the natural numbers, a fact that certainly fits with our 
intuition. More formally, we will show that there is a bijective function between the 
natural numbers and the set of positive integers that preserves the number |, the binary 
operations addition and multiplication, and the relation less than. 


Theorem 1.3.7. Let i: N — Z be defined by i(n) = |(n+1,1)] foralln EN. 


1. The function i: N — Z is injective. 
2. i(N) = {x€Z|x>O}. 
3. i(1)=1. 
4. Leta,b € N. Then 
a. i(a+b) =i(a) +i(b); 
b. i(ab) = i(a)i(b); 
c. a <b ifand only if i(a) < i(b). 


Proof. We prove Parts (2) and (4a), leaving the rest to the reader in Exercise 1.3.6. 


(2) Let y € i(N). Then y = [(p+1,1)] for some p € N. By Part (a) of the 
Peano Postulates we know that p ¥ 1. It then follows from Exercise 1.3.4 (3) that 
[((p +1), 1] > 6, and we deduce that y € {x € Z|x > 0}. Hence i(N) C {x €Z|x > 6}. 
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Now let z € {x € Z| x > 0}. Again using Exercise 1.3.4 (3) we deduce that 
z= |(q, 1)] for some g € N such that g 4 1. By Lemma 1.2.3 we know that q=c+1 
for some c € N, and hence z = [(c + 1,1)], which in turn implies that z € i(N). We 
deduce that i(N) D {x € Z| x > 0}. 


(4a) Using the definitions of addition for Z, the relation ~ and the function i, 
together with Theorem 1.2.7, we see that 


i(a) +i(b) = [(a+1,1)]+[(b+1,1)] = [((a+1)+(b+1),14+1)] 
= [(((a+b) +1) 4+1,1+1)] = [((a+b) +1,1)] =i(a+b). 


Because Theorem 1.3.7 implies that from the point of view of addition, multipli- 
cation and the relation less than, we can identify the natural numbers with the positive 
integers, we can therefore dispense with the set of natural numbers as a separate 
entity (except when we need it in proofs), because we have a copy of the natural 
numbers inside the integers that works just as well. Everything that was proved about 
the natural numbers in Section 1.2 that involve only addition, multiplication and 
the relation less than (and therefore also the relation <, which is derived from <), 
still holds true when we think of the natural numbers as the set of positive integers. 
For example, the reader is asked in Exercise 1.3.10 to give a detailed proof, using 
Theorem 1.3.7, that the Well-Ordering Principle (Theorem 1.2.10), which was stated 
for N in Section 1.2, still holds when we think of N as the set of positive integers. 

From this point on, we will use the symbol N to denote the set of positive integers, 
and we will dispense with the notation 0 and i, and simply write 0 and 1 instead. If we 
let —N denote the set of negative integers, then it follows from the Trichotomy Law 
(Theorem 1.3.5 (10)) that Z = —NU {0} UN, and that these three sets are pairwise 
disjoint. 

We conclude this section with a few more properties of the integers that we 
will need later on. In contrast to the properties in Theorem 1.3.5, the proofs of 
which required going back to the definition of the integers as equivalence classes, 
the properties in the following lemma do not require the use of the definition of the 
integers, but rather are proved directly from the properties in Theorem 1.3.5. As 
such, Theorem 1.3.5 contains more fundamental properties of the integers. Indeed, in 
Section 1.4 (which the reader of the current section should skip), it is seen that what 
we have stated in the present section as Theorem 1.3.5 is taken as part of the axioms 
for the integers, if one starts with such axioms rather than the Peano Postulates. As is 
usual, we will write “—xy” when we mean “—(xy).” 


Lemma 1.3.8. Let x,y,z € Z. 


. Ifx+z=y+z, thenx=y (Cancellation Law for Addition). 

—(-x) =x. 

—(x+y) = (—x)+(—y). 

x-0=0. 

. Ifz40 and if xz =yz, thenx =y (Cancellation Law for Multiplication). 
(—x)y = —xy = x(—y). 
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7. xy = 1 ifand only ifx=1=yorx=—-1=y. 

8. x > 0 ifand only if —x < 0, and x <0 ifand only if —x > 0. 

9 0<1. 

10. Ifx <yandy <x, thenx=y. 

11. Ifx >Oand y > 0, then xy > 0. If x > 0 and y < 0, then xy < 0. 


Proof. We will prove Parts (2), (6), (8) and (9), leaving the rest to the reader in 
Exercise 1.3.11. 


(2) By the Inverses Law for Addition, we know that (—x) + [—(—x)] = 0. It 
follow that x + {(—x) + [—(—x)]} =x+0. By the Associative and Identity Laws for 
Addition it follows that {x + (—x)} + [—(—x)] =x, and hence by the Inverses Law 
for Addition we see that 0 + [—(—x)] =x. We now use the Commutative and Identity 
Laws for Addition to deduce that —(—x) = x. 


(6) By the Inverses Law for Addition, we know that x + (—x) = 0. We then 
use the Distributive Law to deduce that yx + y(—x) = y[x+ (—x)] = y-0. It follows 
from Part (4) of this lemma that yx + y(—x) = 0. By the Commutative Law for 
Multiplication we obtain xy + (—x)y = 0. By the Inverses Law for Addition we know 
that xy + (—xy) = 0. Hence xy + (—x)y = xy + (—xy), and using the Commutative 
Law for Addition we see that (—x)y + xy = (—xy) +.xy. Using Part (1) of this lemma 
we deduce that (—x)y = —xy. A similar argument shows that x(—y) = —xy. 


(8) Suppose that 0 < x. Then by the Addition Law for Order we see that 0 + 
(—x) <x+(—x). By the Commutative, Identity and Inverses Laws for Addition we 
deduce that —x < 0. A similar argument shows that —x < 0 implies 0 < x, and also 
shows that x < 0 if and only if —x > 0. 


(9) By Non-Triviality we know that 0 ¥ 1. It therefore follows from the Tri- 
chotomy Law that either 0 < 1 or 0 > 1, but not both. Suppose that 0 > 1. By 
Part (8) of this lemma we deduce that 0 < —1. It follows from the Multiplica- 
tion Law for Order that 0-(—1) < (—1)(—1). The Commutative Law for Mul- 
tiplication together with Part (4) of this lemma imply that 0 < (—1)(—1). By 
Part (6) of this lemma together with the Identity Law for Multiplication we see 
that (—1)(—1) = —[(—1)- 1] = —(—1), and by Part (2) of this lemma we deduce that 
(—1)(—1) = 1. We conclude that 0 < 1, which is a contradiction to our assumption 
that 0 > 1. Therefore 0 > 1 is impossible, and hence 0 < 1. 


Whereas the proof of Theorem 1.3.8 makes use of only the properties of the 
integers given in Theorem 1.3.5, it turns out that not all properties of the integers 
can be deduced from that theorem. For the following result, which shows that the 
integers are intuitively “discrete,” we need to make use of not only Theorem 1.3.5 (and 
Theorem 1.3.8, which is derived from Theorem 1.3.5), but also Theorem 1.2.9 (9), 
which by Theorem 1.3.7 holds for the set N when viewed as a subset of Z. The 
following theorem cannot be deduced from Theorem 1.3.5 alone because the set of 
real numbers R satisfies all the properties in Theorem 1.3.5, and yet the following 
theorem would not be true if Z were replaced with R. 
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Theorem 1.3.9. Let x € Z. Then there isno y € Z such thatx <y<x+l. 


Proof. Suppose that there is some y € Z such that x < y <x+1. By the Addition 
Law for Order we see that x+ [(—x) + 1] < y+[(—x) +1] < [x+ 1]+[(—x) + 1], and 
then by repeated use of the Associative, Commutative, Identity and Inverses Laws 
for Addition we deduce that 1 < y+[(—x) +1] < 1+ 1. By Theorem 1.3.8 (9) we 
know that 0 < 1, and it then follows from the Transitive Law that 0 < y+ [(—x) + 1]. 
Hence | and y+ [(—x) + 1] are both in N, and we therefore have a contradiction to 
Theorem 1.2.9 (9). 


Finally, we note that because | € Z, we can now use all the familiar integers such 
as the number 2, which is defined to be 1 + 1; the number 3, which is defined to be 
2+ 1; the number 4, which is defined to be 3+ 1; and so on. 

The reader who has read Sections 1.2 and 1.3 should now skip Section 1.4, where 
an axiomatic approach to the integers is taken as an alternative to what we saw in the 
present section and the previous one, and should proceed straight to Section 1.5 for 
the construction of the rational numbers. 


Reflections 


At first glance this section might appear to be much ado about nothing. We defined 
the natural numbers in Section 1.2, and the reader might wonder why we could not 
simply define the integers by letting the negative numbers be the negations of the 
natural numbers, and letting zero be an additional number that works as expected 
with respect to addition. That approach, while in principle workable, turns out to 
be not really all that simple. First, there is no operation of negation defined for the 
natural number in Section 1.2, so we cannot use such an operation until we define 
it, and defining the negation of the natural numbers is tantamount to already having 
the negative numbers defined. It would be possible to proceed formally by defining 
a set, called the set of negative natural numbers, by simply have one such number 
corresponding to each element of the set of natural numbers. However, doing so leads 
to the problem of defining addition on the set of integers, that is, the set containing 
the natural numbers, the formally defined negative natural numbers and a formally 
defined zero. Such a definition would require various cases, depending upon which 
type of numbers are being added, with the main problem occurring when adding a 
natural number and a negative natural number. To define the sum of such numbers, it 
would be necessary to find the difference between the given natural number and the 
natural number that corresponds to the given negative number, which could be done 
via Exercise 1.2.3. It would then be necessary to define multiplication on the set of 
integers, which again requires various cases, and also to prove Theorem 1.3.5 using 
these definitions of addition and multiplication, at which point things become more or 
less as complicated as the method used in the text. Hence, we might as well stick with 
the method in the text, because it does not involve a lot of cases, because after some 
familiarity it is seen to be quite natural and because it is analogous to the construction 
of the rational numbers from the integers used in Section 1.5. 
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Exercises 


Exercise 1.3.1. Let ~ be the relation on N x N defined by (a,b) ~ (c,d) if and only 
if ad = c’b, for all (a,b), (c,d) € N x N, where n? is an abbreviation for n-n. 


(1) Prove that ~ is an equivalence relation. 
(2) List or describe all the elements in [(2,3)]. 


Exercise 1.3.2. [Used in Lemma 1.3.2.] Complete the proof of Lemma 1.3.2. That is, 
prove that the relation ~ is transitive. 


Exercise 1.3.3. [Used in Lemma 1.3.4.] Complete the proof of Lemma 1.3.4. That is, 
prove that - and — for Z are well-defined. The proof for - is a bit more complicated 
than might be expected. [Use Exercise 1.2.5.] 


Exercise 1.3.4. [Used in Theorem 1.3.5 and Theorem 1.3.7.] Let a,b € N. 


(1) Prove that [(a,b)] =0 if and only if a = b. 

(2) Prove that [(a,b)| = 1 if and only ifa=b+1. 

(3) Prove that [(a,b)] = [(n, 1)] for some n € N such that n ¥ 1 if and only if a >b 
if and only if [(a,b)]| > 0. 

(4) Prove that [(a,b)] = [(1,m)] for some m € N such that m # 1 if and only if 
a < bif and only if [(a,b)]| < 0. 


Exercise 1.3.5. [Used in Theorem 1.3.5.] Prove Theorem 1.3.5 (1) (3) (4) (5) (6) (7) 
(8) (10) 11) 13) (14). 


Exercise 1.3.6. [Used in Theorem 1.3.7.] Prove Theorem 1.3.7 (1) (3) (4b) (4c). 


Exercise 1.3.7. Let x,y,z € Z. 


(1) Prove that x < y if and only if —x > —y. 
(2) Prove that if z < 0, then x < y if and only if xz > yz. 


Exercise 1.3.8. [Used in Exercise 1.5.9.] Let x € Z. Prove that if x > 0 then x > 1. 
Prove that if x < 0 then x < —1 


Exercise 1.3.9. 


(1) Prove that 1 < 2. 
(2) Let x € Z. Prove that 2x £ 1. 


Exercise 1.3.10. [Used in Section 1.3.] Prove that the Well-Ordering Principle (Theo- 
rem 1.2.10), which was stated for N in Section 1.2, still holds when we think of N as 
the set of positive integers. That is, let G C {x € Z| x > 0} be a non-empty set. Prove 
that there is some m € G such that m < g for all g € G. Use Theorem 1.3.7. 


Exercise 1.3.11. [Used in Lemma 1.3.8.] Prove Theorem 1.3.8 (1) (3) (4) (5) (7) (10) 
(11). 
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1.4 Entry 2: Axioms for the Integers 


There are two standard approaches to defining the integers: starting with the Peano 
Postulates for the natural numbers and then constructing the integers from the natural 
numbers, or taking the integers axiomatically as an ordered integral domain with an ad- 
ditional condition. The former approach, which we took in Sections 1.2 and 1.3, is the 
longer of the two, but is more revealing of the inner workings of the natural numbers. 
The latter approach, which we take in the present section, is somewhat shorter, and 
therefore provides a slightly quicker route to constructing the real numbers, though 
it still involves the constructing the rational numbers and the real numbers, as in the 
other approach. Both approaches ultimately lead to the same facts about the natural 
numbers and the integers, though they work in reverse order. The Peano Postulates 
for the natural numbers, which are taken axiomatically in Section 1.2, are a theorem 
in the present section, and the axioms for the integers given in the present section are 
theorems in Sections 1.2 and 1.3. In other words, whether we start with the natural 
numbers and construct the integers, or whether we start with the integers and find the 
natural numbers inside them, we obtain the same sets of numbers. Hence, if you have 
read Sections 1.2 and 1.3, you should skip the present section. Our treatment of the 
integers in the present section, which is less commonly used than starting with the 
Peano Postulates but is nonetheless a nice alternative, follows [Dea66, Chapter 3]. 

What is it that characterizes the integers? Certainly, the integers have two binary 
operations, namely, addition and multiplication, as well as the unary operation nega- 
tion, and the relation less than, and these operations and relation satisfy a number of 
standard properties (for example, the fact that x + y = y+-x for all integers x and y). 
However, these aspects alone do not characterize the integers, because we will see 
that the set of rational numbers and the set of real numbers have these same operations 
and relation, which satisfy these same properties. What distinguishes the integers 
from the rational numbers and the real numbers is the “discreteness” of the integers, 
which intuitively means that for each integer, there is a unique integer “right below 
it,’ namely, the integer minus 1, and a unique integer “right above it,’ namely, the 
integer plus 1. Phrased another way, if a is an integer, then there is no integer between 
aand a+ 1. We will prove in Theorem 1.4.6 that the integers are discrete in this sense. 
It turns out, however, that this approach to discreteness does not characterize the 
integers among ordered integral domains, as seen by Exercise 1.4.7. Hence, we will 
need to use a different characterization of discreteness in the axioms for the integers, 
as discussed below. 

In order to state our axioms for the integers, we start with the following definition 
involving some algebraic properties of addition, multiplication and less than. If the 
reader has studied abstract algebra, she has probably encountered the concept of an 
integral domain, though perhaps not an ordered integral domain, which is simply an 
integral domain as standardly defined in abstract algebra together with some additional 
properties for the order relation, and the interaction of this relation with addition and 
multiplication. We do not assume that the reader is already familiar with integral 
domains, and we will give the complete definition here. 
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In the following definition, we use the notion of a binary operation and a unary 
operation on a set, as defined in Section 1.1. As is usual, we will write “xy” instead of 
“x+y,” except in cases of potential ambiguity (for example, we will write “1-1” rather 
than “11’’), or where the - makes the expression easier to read. We will write x > y to 
mean the same thing as y < x. 


Definition 1.4.1. An ordered integral domain is a set R with elements 0,1 € R, 
binary operations + and -, a unary operation — and a relation <, which satisfy the 
following properties. Let x,y,z € R. 


. (x+y)+z=x+(y+z) (Associative Law for Addition). 
x+y=y+x (Commutative Law for Addition). 

x+0=x (Identity Law for Addition). 

x+(—x)=0  (Inverses Law for Addition). 

. (xy)z=x(yz) (Associative Law for Multiplication). 

~ xy=yx (Commutative Law for Multiplication). 

. x-l=x (Identity Law for Multiplication). 

. x(y+z) =xy+xz (Distributive Law). 

Ifxy =0, thenx=Oory=0 (No Zero Divisors Law). 

. Precisely one of x < yorx=yorx>yholds  (Trichotomy Law). 
k. Ifx<yandy<z,thenx<z_ (Transitive Law). 

l. Ifx<ythenx+z<y+z (Addition Law for Order). 

m. Ifx<yandz>0,thenxz< yz (Multiplication Law for Order). 
n. 041 (Non-Triviality). A 


a 


The Non-Triviality axiom might seem, well, trivial, but it is very much needed, 
because otherwise we could have an ordered integral domain consisting of a single 
number 0, which is not what we would want the set of integers to be. 

As previously stated, the properties of an ordered integral domain do not alone 
characterize the integers. It will be seen from Definition 2.2.1 and Lemma 2.3.2 (15) 
that the real numbers are an ordered integral domain as well. In order to distinguish 
the integers from all other ordered integral domains, we will need one additional 
axiom, to which we now turn, starting with the following definition. 


Definition 1.4.2. Let R be an ordered integral domain, and let A C R be a set. 


1. The relation < on R is defined by a < b if and only if a < b ora=b, for all 


a,beER. 
2. The set A has a least element if there is some a € A such that a < x for all 
xEA, A 


If we think of the integers intuitively, we see that some subsets of the integers 
have least elements (for example, all finite subsets of the integers have least elements), 
whereas other subsets of the integers do not have least elements (for example, the 
set of negative integers). In fact, it is precisely the existence of negative integers that 
prevents all subsets of the integers from having least elements; intuitively, because 
of the “discreteness” of the integers, every subset of the positive integers has a least 
element. The real numbers, by contrast, are also an ordered integral domain, but not 
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every subset of the positive real number has a least element (for example, the set of all 
positive real numbers does not have a least element). It turns out that this fact about 
subsets of the positive integers is a strong version of “discreteness,” and it actually 
characterizes the integers. 


Definition 1.4.3. Let R be an ordered integral domain. The ordered integral domain 
R satisfies the Well-Ordering Principle if every non-empty subset of 
{xER|x> 0} 
has a least element. A 
We are now ready for our axiomatic characterization of the integers. 


Axiom 1.4.4 (Axiom for the Integers). There exists an ordered integral domain Z 
that satisfies the Well-Ordering Principle. 


The Axiom for the Integers (Axiom 1.4.4) does not say that the integers are unique, 
though in fact that turns out to be true; the proof of this fact is given in the first two 
steps of the proof of Theorem 2.7.1, and discussion of what we mean by uniqueness 
in this context is found at the start of Section 2.7. 

We now turn to some very useful, and very familiar, properties of the integers. 


Lemma 1.4.5. Let x,y,z € Z. 


. Ifx+z=y+z, thenx=y (Cancellation Law for Addition). 
—(-x) =x. 
~(x-+y) = (2) +(-y). 
x-0=0. 


. fz #0 and if xz =yz, thenx=y (Cancellation Law for Multiplication). 
(—x)y = —xy = x(—y). 

. xy = 1 ifand only ifx=1=yorx=—-1=y. 

. x > 0 ifand only if —x <0, and x <0 if and only if —x > 0. 

0<1. 

10. Ifx <yandy <x, thenx=y. 

Ll. Ifx >O andy >0, then xy > 0. If x > 0 and y < 0, then xy < 0. 
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Proof. We will prove Parts (2), (6), (8) and (9), leaving the rest to the reader in 
Exercise 1.4.1. 


(2) By the Inverses Law for Addition we know that (—x) + [—(—x)] = 0. It 
follows that x + {(—x) + [—(—x)]} =x+0. By the Associative and Identity Laws for 
Addition we deduce that {x + (—x)} +[—(—x)] =~, and hence by the Inverses Law 
for Addition we see that 0 + [—(—x)] =x. We now use the Commutative and Identity 
Laws for Addition to conclude that —(—x) = x. 


(6) By the Inverses Law for Addition we know that x+ (—x) = 0. We then 
use the Distributive Law to deduce that yx + y(—x) = y[x+ (—x)] = y-0. It follows 
from Part (4) of this lemma that yx + y(—x) = 0, and by the Commutative Law for 
Multiplication we obtain xy + (—x)y = 0. By the Inverses Law for Addition we know 
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that xy + (—xy) = 0. Hence xy + (—x)y = xy + (—xy), and using the Commutative 
Law for Addition we see that (—x)y + xy = (—xy) +xy. Using Part (1) of this lemma 
we deduce that (—x)y = —xy. A similar argument shows that x(—y) = —xy. 


(8) Suppose that x > 0. Then by the Addition Law for Order we see that x + 
(—x) > 0+ (—x). By the Commutative, Identity and Inverses Laws for Addition we 
deduce that 0 > —x, which is the same as —x < 0. A similar argument shows that 
—x < 0 implies x > 0, and also shows that x < 0 if and only if —x > 0. 


(9) By Non-Triviality we know that 0 ¥ 1. It therefore follows from the Tri- 
chotomy Law that either 0 < 1 or 0 > 1, but not both. Suppose that 0 > 1. By 
Part (8) of this lemma we deduce that 0 < —1. It follows from the Multiplication 
Law for Order that 0-(—1) < (—1)(—1), and by the Commutative Law for Mul- 
tiplication and Part (4) of this lemma we deduce that 0 < (—1)(—1). By Part (6) 
of this lemma, the Identity Law for Multiplication and Part (2) of this lemma we 
see that (—1)(—1) = —[(—1)- 1] = —(—1) = 1. We conclude that 0 < 1, which is a 
contradiction to our assumption that 0 > 1. Therefore 0 > | is impossible, and hence 
it must be the case that 0 < 1. 


We note that the proof of Lemma 1.4.5 makes use of only the properties of the 
integers given in Definition 1.4.1, and it does not make full use of all the properties 
of the integers. By contrast, the following theorem, which shows that the integers 
are “discrete” in the intuitive sense, makes use of the full power of the Well-Ordering 
Principle. 


Theorem 1.4.6. Let x € Z. Then there is no y € Z@ such thatx <y<x+1. 


Proof. First, suppose that x = 0. Suppose further that there is some y € Z such that 
0<y<1.Let 
S={z€Z|0<z< Il}. 


We know that S 4 @, because y € S. From the definition of N, we observe that S CN. 
It follows from the Well-Ordering Principle that S has a least element. That is, there is 
some p € S such that p < x for all x € S. By the definition of S we know that 0 < p< 1. 
Using the Multiplication Law for Order we see that 0: p < p- p < 1- p, and hence by 
the Commutative and Identity Laws for Multiplication and Lemma 1.4.5 (4) it follows 
that 0 < p-p < p. By the Transitive Law we deduce that 0 < p- p < 1. It follows that 
p:-p €S, and p-: p < p, which is a contradiction to the fact that p is the least element 
of S. We deduce that there is no y € Z such thatO <y< 1. 

We now consider the general case, with arbitrary x. Suppose that there is some 
w € Z such that x < w <x+1. By the Addition Law for Order we deduce that 
x + (—x) <w+(—x) < (x+ 1) + (—+), and then using the Associative, Inverses 
and Identity Laws for Addition we see that 0 < w+ (—x) < 1. We therefore have a 
contradiction to the previous paragraph, and hence no such w exists. 


Our final task in this section is to show that the natural numbers sit inside the 
integers, and behave appropriately. We start with the following definition. 
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Definition 1.4.7. 


1. Let x € Z. The number x is positive if x > 0, and the number x is negative if 
x <0. 
2. The set of natural numbers, denoted N, is defined by 


N= {xEZ|x>0}. A 


If we let —N denote the set of negative integers, then it follows from the Tri- 
chotomy Law (Theorem 1.3.5 (10)) that Z = —NU {0}UN, and that these three sets 
are pairwise disjoint. 

We want to verify that the set N as defined in Definition 1.4.7 (2) indeed behaves 
the way we expect the natural numbers to behave. We will do that by proving that 
the set N as we have now defined it satisfies the “Peano Postulates,’ which were 
taken as the axioms for the natural numbers in Section 1.2. By proving that the Peano 
Postulates follow from the axioms for the integers stated in Axiom 1.4.4 we come full 
circle in our study of the natural numbers and the integers, because in Sections 1.2 
and 1.3 it was shown that the axioms for the integers follow from the Peano Postulates 
(see Theorem 1.2.10 and Theorem 1.3.5). 

To be able to state the following theorem, we note that by Exercise 1.4.2 the map 
s given in the theorem is well-defined. 


Theorem 1.4.8 (Peano Postulates). Let s: N — N be defined by s(n) =n +1 for all 
neN. 


a. There isnon €N such that s(n) = 1. 

b. The function s is injective. 

c. LetG CN bea set. Suppose that 1 € G, and that if g € G then s(g) € G. Then 
G=N. 


Proof. 


(a) Suppose to the contrary that there were some x € N such that s(x) = 1. Then 
x-+1= 1, and therefore x+ 1 = 0+ 1 by the Commutative and Identity Laws for 
Addition. It follows from Lemma 1.4.5 (1) that x = 0. However, we know by Defini- 
tion 1.4.7 that x > 0, which is a contradiction to the Trichotomy Law. 


(b) Suppose that s(n) = s(m) for some n,m € N. Thenn+ 1 =m + 1, and hence 
n=mby Lemma 1.4.5 (1). Therefore s is injective. 


(c) Suppose that G 4 N. Then N — G is non-empty. By the Well-Ordering Princi- 
ple the set N—G has a least element, which means that there is some m € N—G such 
that m < x for all x € N—G. Observe that m 4 1, because 1 € G. By the Trichotomy 
Law, we know that m < 1 or m > 1. Suppose that m < 1. Because m € N we know 
that m > 0. It follows that 0 < m < 1. Using the Commutative and Identity Laws 
for Addition we see that 0 < m <0+1, which is a contradiction to Theorem 1.4.6. 
Therefore it could not have been the case that m < 1. Hence m > 1. Applying Theo- 
rem 1.4.6 again, we know that it cannot be the case that 1 <m < 1+ 1, and therefore 
by the Trichotomy Law it must be the case that m > 1 + 1. It follows from the Inverses, 
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Identity and Associative Laws for Addition that m+ (—1) > 1. By Lemma 1.4.5 (9) 
we know that 0 < 1, and it then follows from the Transitive Law that m+ (—1) > 0. 
Hence m+(—1) EN. 

By Lemma 1.4.5 (9) (8) we see that —1 < 0. Using the Addition Law for Order 
and the Identity Law for Addition we deduce that m+ (—1) < m. Because m is the 
least element of N—G, then m+ (—1) ¢ N—G, and hence m+ (—1) € G. By the 
Associative, Identity and Inverses Laws for Addition, and by hypothesis on G, we 
deduce that m = [m+ (—1)]+ 1 =s(m+ (—1)) € G, which is a contradiction to the 
fact that m € N—G. We conclude that G = N. 


Part (c) of the Peano Postulates is a formal statement that proof by induction 
works. We will discuss proof by induction further in Section 2.5. 

Finally, we note that because | € Z, we can now use all the familiar integers such 
as the number 2, which is defined to be 1 + 1; the number 3, which is defined to be 
2+ 1; the number 4, which is defined to be 3+ 1; and so on. 


Reflections 


The reader who has commenced reading this text with the present section need 
not read either Section 1.2, which has axioms for the natural numbers, or Section 2.2, 
which has axioms for the real numbers. However, even though it is not necessary for 
the reader of the present section to read those two other sections in order to have a 
completely rigorous treatment of the real numbers, it might be of interest to the reader 
to compare the types of axioms in each of those two other sections with the axioms 
in the present section. As the reader will see from such a comparison, the axioms in 
the present section share some of the features of the axioms of each of the two other 
sections. 

What the axioms for the integers in the present section and the axioms for the 
natural numbers in Section 1.2 have in common is that they are entirely algebraic 
in nature, and can be found in some texts on algebra. By contrast, the axioms for 
the real numbers in Section 2.2 include the Least Upper Bound Property, which is 
at the heart of real analysis, and is rarely discussed in a purely algebraic context. 
What the axioms for the integers in the present section and the axioms for the real 
numbers in Section 2.2 have in common is their parallel structure, namely, both start 
with the axioms for a general algebraic structure with two binary operations and an 
order relation (ordered integral domains and ordered fields, respectively), and both 
add one additional axiom to distinguish the particular set of numbers in question 
(the Well-Ordering Principle and the Least Upper Bound Property, respectively). By 
contrast, the axioms for the natural numbers in Section 1.2 do not at all resemble 
the axioms in the two other sections, in that for the natural numbers neither a binary 
operation nor an order relation is postulated, and the assumptions made seem much 
more minimal than the two other sets of axioms. 

From the above comparison, one could conclude that the axioms in the present 
section are either the best of both worlds, being purely algebraic and yet parallel 
to other useful axioms systems, or the worst of both, being neither as minimal as 
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one alternative nor as powerful as the other. It is left to the reader to decide which 
assessment, if either, is the correct one. 


Exercises 


Exercise 1.4.1. [Used in Lemma 1.4.5.] Prove Lemma 1.4.5 (1) (3) (4) (5) (7) (10) 
(11). 


Exercise 1.4.2. [Used in Section 1.4.] Let n € N. Prove thatn+1<€N. 


Exercise 1.4.3. [Used in Exercise 1.4.8.] Let x,y € Z. Prove that x < y if and only if 
—xX > —y. 


Exercise 1.4.4. [Used in Exercise 1.4.6, Exercise 1.4.8 and Exercise 1.5.9.] Prove that 
N= {xeEZ|x> I}. 


Exercise 1.4.5. Let a,b € Z. Prove that if a <b, thena+1<b. 


Exercise 1.4.6. [Used in Theorem 2.5.4.] Let n € N. Suppose that n 4 1. Prove that 
there is some b € N such that b+ 1 =n. [Use Exercise 1.4.4.] 


Exercise 1.4.7. [Used in Section 1.4.] Let Z[x] denote the set of polynomials with 
integer coefficients and variable x. This set has binary operations + and - as usual 
for polynomials. The relation <, called the dictionary order on Z|x], is defined by 
f <g if and only if either the degree of f is less than the degree of g, or if the degrees 
of f and g are equal and if f ¥ g and if the highest degree coefficient which differs 
for f and g is smaller for f, for all f,g € Z[x]. Let 0,1 € Z[x] be the polynomials that 
are constantly 0 and 1, respectively. 


(1) Prove that Z|x], with +, -, <, 0 and | as defined above, is an ordered integral 
domain. 

(2) Let f € Z[x]. Prove that there is no g € Z|x] such that f<g<g+l. 

(3) Prove that Z|x] does not satisfy the Well-Ordering Principle. 


Exercise 1.4.8. [Used in Exercise 1.4.10.] Let a € Z. 
(1) Let G C {x € Z| x > a} be a set. Suppose that a € G, and that if g € G then 


g+16€G. Prove thatG= {xe Z|x>a}. [Use Exercise 1.4.4.] 
(2) Let H C {x € Z| x <a} bea set. Suppose that a € H, and that if h € H then 
h+(-1) €H. Prove that H = {x€Z|x <a}. [Use Exercise 1.4.3.] 


Exercise 1.4.9. [Used in Exercise 1.4.10.] The two standard approaches to defining 
the integers are the approaches we took in Section 1.3 (where we started with the 
Peano Postulates for the natural numbers, and constructed the integers from the 
natural numbers) and in Section 1.4 (where we took the integers axiomatically as an 
ordered integral domain that satisfies the Well-Ordering Principle). The purpose of 
this exercise is to present an additional approach to axiomatizing the integers, which 
is from [Mar61]. 

The idea for this alternative axiom for the integers is to modify the Peano Postu- 
lates so that there is no “first” natural number, but instead every integer has both a 
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successor and a predecessor; proof by induction is done by “going in both directions” 
at once. We will not give all the details of the equivalence of this approach with the 
approaches we have taken, because doing so would be as lengthy as Section 1.2 (and 
of a similar nature), but we will sketch part of the ideas. In this exercise we will start 
with the alternative axiom, and go as far as showing that the integers as given by this 
alternative axiom contain a set, called the natural numbers, that satisfy the Peano 
Postulates. In Exercise 1.4.10 it will be seen that the integers, as defined in the present 
section, satisfy the alternative axiom. 

The alternative axiom for the integers is that there exists a set Z and a function 
s: Z— Z that satisfy the following five properties. 


a. ZA0. 

b. The function s is injective. 

c. For each a € Z, there is some b € Z such that a = s(b). 

d. Let GC Z be a set. Suppose that G 4 @, and that g € G if and only if s(g) € G. 


Then G = Z. 


. There is some Q C Z such that Q £ 0 and Q F Z, and that if g € Q then 


s(q) €Q. 


The following items help clarify the nature of this alternative axiom. 


(1) Find an example to show that Property (e) of these axioms cannot be dropped. 


That is, find an example of a set W and a function that s: W — W that satisfy 
all the axioms except Property (e). 


(2) Prove that s is bijective. 
(3) Prove that Property (d) can be replaced with the following condition: Let 


GC Z bea set. Suppose that G 4 0, and that if g € G then s(g) € G and 
s~'(g) € G. Then G = Z. (This reformulation of Property (d) is more clearly 
seen to be “induction going in both directions.”) 


(4) Let Q C Z be a set satisfying Property (e). Prove that there is some r € Z such 


that r ¢ QO and s(r) € Q. 


(5) Let Q and r be as in Part (4) of this exercise. Rename the element r as “0.” Let 


1 =5(0). A subset A C Z is natural if it has the properties that 0 ¢ A, that 
1 €A, and that if a € A then s(a) € A. Observe that there exist natural subsets 
of Z, for example the set Q. We then define N C Z to be the intersection of 
all natural subsets of Z. First, prove that N is a natural subset of Z, and that 
it is a subset of every other natural subset of Z. Second, prove that N as so 
defined satisfies the Peano Postulates (as stated in Theorem 1.4.8, though 
with the function s given by the axioms in this exercise, and not as given in 
Theorem 1.4.8). 


Exercise 1.4.10. This exercise makes use of Exercise 1.4.9. Prove that Z, as de- 
fined in the present section, satisfies the alternative axiom given in Exercise 1.4.9. 


[Use Exercise 1.4.8.] 
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1.5 Constructing the Rational Numbers 


Although the integers have nicer properties than the natural numbers (for example, 
we can take negatives of integers), the integers are still not entirely satisfactory from 
an algebraic viewpoint, because we cannot divide one integer by another and still 
expect to obtain an integer. To remedy this problem, we will now construct the rational 
numbers (which intuitively are the fractions) from the integers to allow for division, 
or, equivalently, to allow for the existence of multiplicative inverses of integers other 
than zero. We will, as expected, find a copy of the set of integers inside the set of 
rational numbers. 

For the sake of brevity, we will sometimes use the standard properties of the 
integers (for example the Commutative Law for Addition) that we have previously 
encountered in Section 1.3 or 1.4 without always giving explicit references to the 
relevant theorems and lemmas. The constructions and proofs in this section are very 
much analogous to the construction of the integers from the natural numbers in 
Section 1.3. (For the reader who has not read Section 1.3, we note that we do not make 
any use here of that section, and we mention it only by way of analogy for the reader 
who has read Section 1.3). In particular, we will make crucial use of equivalence 
relations and equivalence classes in the construction of the rational numbers from the 
integers. 

We want to think of rational numbers as expressions of the form ae where 
a,b € Z, and where b 4 0. However, because we do not have the operation division 
on Z, we will replace “¢” in our construction of the rational numbers with the pair 
(a,b). It certainly happens that “?” equals “S” for some a,b,c,d € Z, where a 4c 
and b £d, and b #0 and d £ 0, and in that case we ought to have the pairs (a,b) and 
(c,d) represent the same rational numbers. To take care of this problem we define the 
following relation on pairs of integers. 


Definition 1.5.1. Let Z* = Z — {0}. The relation = on Z x Z* is defined by (x,y) = 
(z,w) if and only if xw = yz, for all (x,y),(z,w) € Zx Z*. A 


Lemma 1.5.2. The relation < is an equivalence relation. 


Proof. We will prove transitivity, leaving reflexivity and symmetry to the reader 
in Exercise 1.5.1. Let (x,y),(z,w), (u,v) € Z x Z*. Suppose that (x,y) = (z,w) and 
(z,w) < (u,v). Then xw = yz and zv = wu. It follows that (xw)v = (yz)v and y(zv) = 
y(wu), which implies that (xv)w = (yz)v and (yz)v = (yu)w, and hence (xv)w = (yu)w. 
We know that w 4 0, and therefore we deduce that xv = yu. It follows that (x,y) = 
(u,v). Therefore = is transitive. 


The set of rational numbers, together with addition, multiplication, negation, 
multiplicative inverse and the relations less than and less than or equal to, are given 
in the following definition. We use the standard symbols +, -, —, < and < in the 
definition, though we need to be careful to note that these symbols formally mean 
different things when used with the rational numbers and when used with the integers. 
(Moreover, it is not possible to “add” an integer and a rational number using addition 
as we have defined it; this problem will be resolved when we see that a copy of the 
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integers sits inside the set of rational numbers.) Recall the definition of binary and 
unary operations in Section 1.1. 


Definition 1.5.3. The set of rational numbers, denoted Q, is the set of equivalence 
classes of Z x Z* with respect to the equivalence relation =. 

The elements 0,1 € Q are defined by 0 = [(0,1)] and 1 = [(1,1)]. Let Q* = 
Q-— {0}. The binary operations + and - on Q are defined by 


[(x,y)] + [(z,w)] = [aw +yz,yw)] 
[@,y)] - [(z,w)] = [(xz,yw)] 


for all [(x,y)],[(z,w)] € Q. The unary operation — on Q is defined by —[(x,y)] = 
[(—x, y)] for all [(x,y)] € Q. The unary operation ~! on Q* is defined by [(x,y)]7! = 
[(y,x)] for all [(x,y)] € Q*. The relation < on Q is defined by [(x,y)] < [(z,w)] if and 
only if either xw < yz when y > 0 and w > 0 or when y < 0 and w < 0, or xw > yz 
when y > 0 and w < 0 or when y < 0 and w > 0, for all [(x,y)], [(z,w)] € Q. The 
relation < on Q is defined by [(x,y)] < [(z,w)] if and only if [(x,y)] < [(z,w)] or 

A 


[(x,y)] = [(z,w)], for all [(x,y)], [(z,w)] € Q 


We now need to check whether the binary operations + and -, the unary operations 
—and ~! and the relation < are well-defined. For example, we defined [(x,y)] - [(z,w)] 
to be [(xz,yw)], but for this definition to make sense, we need to verify that if [(x,y)] = 
[((p,q)] and [(z,w)] = [(s,t)], then [(xz,yw)] = [(ps,qt)]. That is, we need to verify 
that the product of the equivalence classes [(x,y)] and [(z,w)] depends only upon the 
equivalence classes themselves, and not upon the particular elements that are used to 
represent the equivalence classes. As seen in the following lemma, everything works 
out well. (We do not deal with the relation < in the lemma, because that is defined in 
terms of <.) 


Lemma 1.5.4. The binary operations + and -, the unary operations — and ~!, and 
the relation <, all on Q, are well-defined. 


Proof. We will show that - and — are well-defined, leaving the rest to the reader in 
Exercise 1.5.2. 

Let (x,y),(z,w), (a,b), (c,d) € Z x Z*. Suppose that [(x,y)] = [(a,b)] and that 
[(z,w)] = [(c,d)]. 

By hypothesis we know that (x,y) = (a,b) and (z,w) = (c,d). Hence xb = ya 
and zd = wc. By multiplying these two equations and doing some rearranging we 
obtain (xz)(bd) = (yw) (ac), and this implies that [(xz,yw)] = [(ac,bd)]. Therefore 
- is well-defined. Also, from xb = ya we deduce that (—x)b = y(—a), and hence 
((—x,y)] = [(—a,b)]. Therefore — is well-defined. 


The following theorem states the most fundamental algebraic properties of the 
rational numbers. The idea of the proof of the following theorem is to rephrase things 
in terms of integers, and then use the appropriate facts we have already seen about the 
integers. As is usual, we will write “rs” instead of “r-s,” except in cases of potential 
ambiguity, or for ease of reading. We will write r > s to mean the same thing as s <r, 
and similarly for r > s. 
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Theorem 1.5.5. Letr,s,t EQ. 


. (r+s)+t=r+(s+t) (Associative Law for Addition). 
r+s=s+r (Commutative Law for Addition). 

r+0=r _ (Identity Law for Addition). 

r+(-r)=0_ (Inverses Law for Addition). 

. (rs)t=r(st) (Associative Law for Multiplication). 

rs=sr (Commutative Law for Multiplication). 

. r-l=r_ (Identity Law for Multiplication). 

. Ifr £0, thenr-r-!'=1  (Inverses Law for Multiplication). 

. r(s+t)=rs+rt (Distributive Law). 

10. Precisely one of r<sorr=sorr>sholds  (Trichotomy Law). 
Ll. Ifr<sands <t, thenr<t (Transitive Law). 

12. Ifr <sthenr+t<s+t (Addition Law for Order). 

13. Ifr <sandt >0, then rt <st (Multiplication Law for Order). 
14.041 (Non-Triviality). 


CAHN AWAWNN 


Proof. We will prove Parts (4), (7), (10) and (13), leaving the rest to the reader in 
Exercise 1.5.4. Suppose that r = [(x,y)], that s = [(z,w)] and that t = [(u,v)] for some 
x,z,u € Zand y,w,v € Z*. Throughout this proof we will make use of properties of 
the integers that we saw in Section 1.3 or 1.4. 


(4) By the definition of negation and addition of rational numbers we see that r+ 


(—r) = [(x,y)] + [(—+,y)] = [ay + (—)y, yy)] = [(0, yy)] = 0, where the last equality 
holds by Exercise 1.5.3 (1). 


(7) Using the definition of 1 and multiplication of rational numbers we see that 
r-1=[(x,y)]-(0, D1 =[@- Ly DI =[@y)] =r. 


(10) We need to show that precisely one of r <s, orr=s or r > s holds. There 
are two cases. 

First, suppose that y > 0 and w > 0, or that y < 0 and w < 0. Then the Trichotomy 
Law for the integers states that precisely one of yz < xw or yz = xw or yz > xw holds. 
If yz <xw then r > s by the definition of < for rational numbers. If yz = xw, then 
r=s by the definition of the relation =. If yz > xw then r < s by the definition of < 
for rational numbers. The second case, where y > 0 and w < 0, or where y < 0 and 
w > O, is similar to the first case, and we omit the details. 


(13) Suppose that r < sand t > 0. 

There are now four cases. First, suppose that y > 0 and w > 0, or that y < 0 
and w < 0; and suppose that v > 0. By the definition of < for rational numbers, 
the inequality t > 0 implies v-0 < u- 1, which implies 0 < u. Hence uv > 0. Again 
using the definition of < for rational numbers, the inequality r < s implies yz > xw. 
Therefore (yz)(uv) > (xw)(uv), and hence (zu)(yv) > (xu)(wv). Because v > 0, then 
y > 0 and w > 0 imply yw > 0 and wv > 0, and y < 0 and w < 0 imply w <0 
and wv < 0. We then deduce from the definition of < for rational numbers that 
[(xu,yv)] < [(zu, wv)]. However, by the definition of multiplication of rational numbers 
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we know that rt = [(x,y)]- [(u,v)] = [(xu,yv)] and st = [(z,w)]-[(u,v)] = [(zu,wv)], 
and that completes the first case. 

The other three cases, which depend upon whether each of y, w and v is positive 
or negative, are similar to the first case, and we omit the details. 


The properties of the rational numbers listed in Theorem 1.5.5 are satisfied by 
other number systems as well (for example the real numbers), and are sufficiently 
important to merit special terminology. Any set with two binary operations that 
satisfies Parts (1)-(9) is called a “field,” and any field that also has a relation that 
satisfies Parts (10)—(14) of these theorems is called an “ordered field.” Fields are 
studied extensively in abstract algebra. 

Although many of the properties of the rational numbers are similar to properties 
of the integers, we note that there is no analog for the rational numbers of Theo- 
rem 1.3.9 or Theorem 1.4.6. Indeed, it is seen in Exercise 1.5.7 that between any two 
rational numbers there is another rational number. 

Because of the way we constructed the rational numbers from the integers, these 
two sets are entirely disjoint. However, we can find a copy of the set of integers inside 
the set of rational numbers by identifying each integer x with the rational number 
[(x,1)], where we think informally of [(x, 1)] as representing the fraction |. We will 
see in the following theorem that this identification preserves the numbers 0 and 1, 
the binary operations addition and multiplication and the relation less than. 


Theorem 1.5.6. Let i: Z — Q be defined by i(x) = [(x,1)] for all x € Z. 


1. The function i: Z — Q is injective. 
2. i(0) =0 and i(1) =1. 
3. Let x,y © Z. Then 
a. i(x+y) =i(x) +i(y); 
b. i(—x) = —i(x); 
c. lay) = i(a)i(y); 
d. x <y if and only if i(x) < i(y). 
4. For each r € Q there are x,y € Z such that y £0 and r = i(x)(i(y))~'. 


Proof. We prove Part (4), leaving the rest to the reader in Exercise 1.5.5. 


(4) Let r € Q. Then r = [(x,y)] for some (x,y) € Z x Z*. We then have i(x) = 
[(x, 1)] and i(y) = [(y, 1)]. Because y 4 0, we see that [(y, 1)] 4 0 by Exercise 1.5.3 (1). 
Hence i(y) € Q*. It then follows from Definition 1.5.3 that i(x)(i(y))~! = [(x,1)]- 


[(y,1)]-* = [@, 1)] - [1 »)] = [@-1,1-y)] = [@,y)] = 7. 


Even though technically the set of integers and the set of rational numbers are 
entirely disjoint, we see from Theorem 1.5.6 that from the point of view of addition, 
multiplication, negation and the relation less than, the integers can be identified with 
a subset of the rational numbers. We can therefore dispense with the set of integers as 
a separate entity (except when we need it in proofs), because we have a copy of the 
integers inside the rational numbers that works just as well as the original. We will 
therefore also dispense with the notation 0 and 1, and simply write 0 and 1 instead. 
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Everything that was proved about the natural numbers and the integers in previous 
sections, and which involve only addition, multiplication and the relation less than, 
still hold true when we think of the natural numbers and the integers as part of the set 
of rational numbers. 

We can draw another useful conclusion from Theorem 1.5.6 as well, but first we 
need the following definition. 


Definition 1.5.7. The binary operation — on Q is defined by r— s = r+ (—s) for all 
r,s € Q. The binary operation + on Q* is defined by r+s = rs! for all r,s € Q*; we 
also let 0+s = 0-s~! =0 for all s € Q*. The number r~ s is also denoted - A 


Observe that if b € Q then 0 —b = —b, and if b £ 0 then i =p}, 

By thinking of the integers as sitting inside the rational numbers, the function i 
in Theorem 1.5.6 then becomes the function that takes each integer to itself. We can 
then combine Definition 1.5.7 with Theorem 1.5.6 (4) to see that we now see that for 
each r € Q there are a,b € Z such that b 4 0 and r = ¢. We can think of each integer 
n € Zas being identified with the fraction +. Hence, We have now come full circle in 
our discussion of the rational numbers, aad we have recovered our original intuitive 
notion of rational numbers as fraction. Moreover, by the proof of Theorem 1.5.6 (4) 
we see that the rational number ¢ is the same as the rational number [(a,b)]. We can 
then reformulate Definition 1.5.3 in the following lemma, which we state without 
proof, because it is just a matter of translating from one notation into another. 


Lemma 1.5.8. Let a,c € Z and b,d € Z’. 


1. § = § if and only if ad = be. 

2. 4S= ae 

pe 7, 

“bd bd” 

5. Ifa #0, then ($)~ — 

6. Ifb >Oandd>0, or ifb <Oandd <0, then % < § if and only if ad < bc; 


ifb >Oandd <0, or ifb <0 and d >0, then § < § if.and only if ad > be. 


We can now comfortably use fractions exactly as we learned to use them in 
elementary school, though with the knowledge that there is a completely rigorous 
foundation for all that we learned. 


Reflections 


The rational numbers do not always get the respect they deserve in real analysis 
courses. In the present chapter, where the real numbers are constructed via a process 
that starts with an axiomatic treatment of either the natural numbers or the integers, 
the rational numbers are sometimes viewed as merely a stepping stone on the way to 
the construction of the real numbers. In Chapter 2, where we start with an axiomatic 
treatment of the real numbers, the rational numbers are sometimes viewed as merely a 
subset of the real numbers that does not behave as nicely as the set of all real numbers. 

In fact, the rational numbers are extremely important from a variety of perspectives. 
From the point of view of applications, the rational numbers are very useful for 
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many computations in the real world; indeed, in spite of the availability of modern 
calculators and computers, it would be a mistake to assume that fractions are no 
longer an important topic for school mathematics. From a theoretical point of view, 
the rational numbers play a very important role in abstract algebra and number theory. 
Even for real analysis the rational numbers are important from a pedagogical point 
of view. Much of what we do in real analysis makes crucial use of the Least Upper 
Bound Property of the real numbers, a property not satisfied by the rational numbers. 
In order to appreciate the role of this property in real analysis, it will be useful for the 
reader to ask herself whether or not various aspects of real analysis, to be encountered 
subsequently in this text, also work for the rational numbers. 


Exercises 


Exercise 1.5.1. [Used in Lemma 1.5.2.] Complete the proof of Lemma 1.5.2. That is, 
prove that the relation = is reflexive and symmetric. 


Exercise 1.5.2. [Used in Lemma 1.5.4.] Complete the proof of Lemma 1.5.4. That is, 
prove that the binary operation +, the unary operation™! and the relation <, all on Q, 
are well-defined. 


Exercise 1.5.3. [Used in Theorem 1.5.5 and Theorem 1.5.6.] Let x € Z and y € Z*. 


(1) Prove that [(x,y)] = 0 if and only if x = 0. 

(2) Prove that [(x,y)] = 1 if and only if x =y. 

(3) Prove that 0 < [(x,y)] if and only if 0 < xy. 
Exercise 1.5.4. [Used in Theorem 1.5.5.] Prove Theorem 1.5.5 (1) (2) (3) (5) (6) (8) 
(9) (11) (12) (14). 


Exercise 1.5.5. [Used in Theorem 1.5.6.] Prove Theorem 1.5.6 (1) (2) (3). 


Exercise 1.5.6. [Used in Exercise 1.6.2, Theorem 1.7.6 and Exercise 1.7.3.] Let r,s, p, 
qgEQ 

(1) Prove that -1<0< 1. 

(2) Prove that if r<s then —s < —r. 

(3) Prove that r-0 = 0. 

(4) Prove that if r > 0 ands > 0, thenr+s>0Oandrs > 0. 

(5) Prove that if r > 0, then 4 >0. 

(6) Prove that if0 <r <s, then} <}. 

(7) Prove that if 0 <r< pand0O<s <q, then rs < pq. 


Exercise 1.5.7. [Used in Lemma 1.6.2, Lemma 1.6.8 and Exercise 1.6.2.] 


(1) Prove that 1 < 2. 
(2) Let s,t € Q. Suppose that s < ft. Prove that oe € Q, and that s < ae <b: 


Exercise 1.5.8. Let r € Q. Suppose that r > 0. 


(1) Prove that if r= ¢ for some a,b € Z such that b ¥ 0, then either a > 0 and 
b>0,ora<Oandb<0. 
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(2) Prove that r= “ for some m,n € Z such that m > 0 andn > 0. 
Exercise 1.5.9. [Used in Lemma 1.6.9 and Exercise 1.6.2.] Let r,s € Q. 


(1) Suppose that r > 0 and s > 0. Prove that there is some n € N such that s < nr. 
[Use Exercise 1.5.6, Exercise 1.5.8, and either Exercise 1.3.8 or Exercise 1.4.4.] 

(2) Suppose that r > 0. Prove that there is some m € N such that 1 <r. 

(3) For each x € Q, let x* denote x- x. 


Suppose that r > 0 and s > 0. Prove that if r? < p, then there is some k € N 


such that (r+ a <p. 
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The rational numbers work very well from the point of view of addition and multipli- 
cation, because of the existence of negatives and multiplicative inverses (of non-zero 
rational numbers), but they are still not satisfactory for doing real analysis. 

The most obvious flaw of the rational numbers is that it is not possible to solve—in 
the set of rational numbers—some polynomial equations with rational coefficients, for 
example x” — 2 = 0; we will see a proof of this fact in Theorem 2.6.11. By contrast, 
as we will see in Theorem 2.6.9, the equation x? —2 = 0 does have a solution in 
the set of real numbers. Hence, the real numbers, to be constructed in Section 1.7 
after preliminaries in the present section, are an improvement upon the rational 
numbers from the point of view of solving polynomial equations. It should be noted, 
however, that although the real numbers are better than the rational numbers for 
solving polynomial equations, the real numbers are also not entirely satisfactory in 
this regard, because there are some polynomial equations with rational coefficients, 
for example x” + 2 = 0, that have no solution in the real numbers. (This equation, 
and indeed all polynomial equations with rational coefficients, have solutions in the 
complex numbers. A very important theorem, called the Fundamental Theorem of 
Algebra, states that any polynomial with complex coefficients (which includes all 
rational and real coefficients) has a root in the complex numbers. Most introductory 
complex analysis texts include a proof of the Fundamental Theorem of Algebra; see 
[BCO09, Section 53]. We will not discuss complex numbers in this text. For a proof of 
the Fundamental Theorem of Algebra using topology rather than complex analysis, 
see [Mun00, Section 56].) 

As important as it is to be able to solve polynomial equations, however, that is not 
the difference between the rational numbers and the real numbers that makes the real 
numbers an appropriate place to do analysis and the rational numbers inappropriate 
for analysis. Notice, for example, that the rational numbers are missing not only 
numbers such as V/2, which is the root of a polynomial with rational coefficients, 
but also very important numbers such as 7 and e, which are not the roots of any 
polynomial with rational coefficients, and are called “transcendental numbers.” See 
[Ste04, Sections 24.2 and 24.3] for proofs that z and e are transcendental, and see 
[Jac85, Section 4.12] for a more general result about transcendental numbers that 
includes both 7 and e. 
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The essential difference between the rational numbers and the real numbers is that 
the real numbers are “complete” and the rational numbers are not. By complete we 
mean intuitively that there are no “gaps” between sets of numbers that ought not to 
have gaps between them. In the rational numbers, by contrast, there is, intuitively, a 
gap where each of the numbers such as V2, z and e ought to be, but are not. It is these 
sorts of gaps that make the rational numbers unfit for real analysis, because some 
important theorems in real analysis, for example the Extreme Value Theorem and the 
Intermediate Value Theorem (both proved in Section 3.5), do not work when there 
are gaps where there ought to be numbers. Technically, the idea of completeness for 
the real numbers can be formulated in two ways, one involving least upper bounds, 
and the other involving Cauchy sequences. The Least Upper Bound Property of the 
real numbers, which we will use directly and indirectly throughout this text, is first 
discussed in Section 1.7, and is discussed in greater detail in Section 2.6. Cauchy 
sequences are discussed in Section 8.3, though to ascertain their full importance as a 
tool for determining completeness the reader will have to wait until she encounters 
topics such as metric spaces that are beyond the scope of this book. 

Before we can discuss the Least Upper Bound Property of the real numbers, we 
need to have the set of real numbers, and our present task is to construct the real 
numbers from the rational numbers. This construction is somewhat more complicated 
than the construction of the integers from the natural numbers (in Section 1.3) or the 
construction of the rational numbers from the integers (in Section 1.5), and we will 
need the present section just for preliminaries, to be followed by the actual definition 
of the real numbers in Section 1.7. 

There are two standardly used approaches for constructing the real numbers from 
the rational numbers. One of the methods also involves Cauchy sequences; more 
precisely, it uses equivalence classes of Cauchy sequences of rational numbers. The 
other method, which is the approach we take, uses Dedekind cuts, to be defined shortly; 
this method does not involve equivalence classes. The Cauchy sequence method is 
quicker, but it involves the extra burden of learning about Cauchy sequences at this 
point in the development of the material; the proofs in the Dedekind cut method 
are lengthier, but have the advantage of involving nothing beyond what we have 
seen so far about the rational numbers. The original treatment of Dedekind cuts is 
in [Ded63]; our discussion of Dedekind cuts, using a slight variant of the original 
approach, follows [Rud53], [Men73] and [Bur67]. See [Sto79, Sections 3.5 and 3.6] 
or [Str00, Chapter 2] for the Cauchy sequence construction of the real numbers from 
the rational numbers. 

The intuitive idea behind Dedekind cuts is simple, even though their formal 
definition might appear somewhat technical at first. We want to use the rational 
numbers to construct the real numbers, and the key observation is that every real 
number can be characterized by the set of rational numbers that are greater than it. For 
example, the real number | is characterized by the set {x € Q|x > 1}. Of course, the 
number | is a rational number, and so the set {x € Q | x > 1} is defined in the realm 
of rational numbers, which is all we currently have at our disposal. On the other hand, 
the set {x € Q| x > V2} is not describable using only the rational numbers, because 
/2 is not a rational number. Instead, we define Dedekind cuts to be subsets of Q that 
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behave as sets of the form {x € Q | x > r} for real numbers r ought to behave, while 
using a definition that is strictly in terms of the rational numbers. After proving some 
properties of Dedekind cuts in the present section, we will, in Section 1.7, define the 
set of real number to be the collection of all Dedekind cuts of Q. 

In order to focus on the key ideas in our discussion of Dedekind cuts, and in order 
to keep the proofs from being any longer than necessary, we will not cite references 
to the standard (and very familiar) properties of the rational numbers proved in 
Section 1.5. 


Definition 1.6.1. Let A C Q be a set. The set A is a Dedekind cut if the following 
three properties hold. 


a. AADandA FQ. 
b. Letx € A. If y € Qand y > x, theny EA. 
c. Let x € A. Then there is some y € A such that y < x. A 


The definition of Dedekind cuts is rather abstract, and before proving things about 
them, we need to verify that they actually exist. As seen in the following lemma, 
Dedekind cuts are in fact plentiful. 


Lemma 1.6.2. Let r € Q. Then the set 


{xEQ|x>r} 
is a Dedekind cut. 


Proof. Let D = {x € Q|x >r}. We will show that D satisfies the three parts of the 
definition of Dedekind cuts. 


(a) We know that r—1,r+1€Q, and that r—1<r<r+1.Hencer—1¢D and 
r+1€D, and therefore D4 0 and D#Q. 


(b) Let m € D, and let y € Q. Suppose that y > m. Because m > r, it follows that 
y>r. Hence ye D. 


(c) Let n € D. Then n > r. It follows from Exercise 1.5.7 (2) that ae € Q, and 
r< nt <n. Hence wt ED, 


Because of Lemma 1.6.2 we know that Dedekind cuts exist, and there are at least 
as many of them as there are rational numbers. The natural question to ask next is 
whether all Dedekind cuts are of the form given in Lemma 1.6.2. Of course, if all 
Dedekind cuts had that form, then it would have been silly to have defined Dedekind 
cuts in the abstract way that we did, and so the reader might well guess, as will indeed 
be seen in the following example, that not all Dedekind cuts have the form given in 
Lemma 1.6.2. 


Example 1.6.3. To find a Dedekind cut that is not of the form given in Lemma 1.6.2, 
the simplest idea is to look at the set of all rational numbers greater than a real number 
that is not rational; the problem is how to describe such a set without making use of 
anything other than the rational numbers. In the case of the number r = V2, there turns 
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out to be a simple solution to this problem, as we will now see. We note first, however, 
that we have not yet formally defined what “2” means, nor proved that there is 
such a real number, though we will do so in Theorem 2.6.9 and Definition 2.6.10. We 
have also not yet proved that “\/2” is not rational, a fact with which the reader is, at 
least informally, familiar; we will see a proof of this fact in Theorem 2.6.11. More 
precisely, it will be seen in that example that there is no rational number x such that 
x? = 2, and this last statement makes use only of rational numbers, so it is suited to 
our purpose at present. Nothing in our subsequent treatment of “\/2” in Section 2.6 
makes use of the current example, so it will not be circular reasoning for us to make 
use of these subsequently proved facts here. 
Let 
T ={x€Q|x>Oand x’ > 2}. (1.6.1) 


It is seen by Exercise 1.6.2 (1) that T is a Dedekind cut, and by Part (2) of that 
exercise it is seen that if T has the form {x € Q | x > r} for some r € Q, then r? = 2. 
By Theorem 2.6.11 we know that there is no rational number x such that x” = 2, and 
it follows that T is a Dedekind cut that is not of the form given in Lemma 1.6.2. 4 


Example 1.6.3 explains the need for the following definition. 


Definition 1.6.4. Let r € Q. The rational cut at r, denoted D,, is the Dedekind cut 
D, = {x€EQ|x>r}. 


An irrational cut is a Dedekind cut that is not a rational cut at any rational number. 


A 


Before using Dedekind cuts to form the set of real numbers in Section 1.7, we will 
take the remainder of the present section to prove some technically useful properties of 
Dedekind cuts, starting with the following simple lemma that will be used frequently. 


Lemma 1.6.5. Let A C Q be a Dedekind cut. 


I. Q-A={x€Q|x<aforalla€A}. 
2. Letx € Q—A. Ify € Qandy <x, theny €Q-A. 


Proof. 


(1) Let x € Q—A. Let a € A. We know that x < aorx=aorx>a.Ifx=a, then 
x would be in A by the definition of a, which is a contradiction. If x > a, then x would 
be in A by Part (b) of the definition of Dedekind cuts, which is a contradiction. Hence 
x <a. We conclude that Q—A C {x € Q|x <a foralla eA}. 

Now let y € {x € Q| x <a for alla € A}. If y © A, we would then have y < y, 
which is a contradiction. Hence y € Q—A. We deduce that Q—A > {x EQ|x< 
a for alla € A}. Therefore Q—A = {x € Q|x <a foralla€ A}. 


(2) This part follows easily from Part (1); the details are left to the reader. 


Our next two lemmas about Dedekind cuts are straightforward. 
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Lemma 1.6.6. Let A,B C Q be Dedekind cuts. Then precisely one of A G BorA=B 
orB g A holds. 


Proof. If A = B there is nothing to prove, so assume that A # B. There are now 
two cases. First, suppose that there is some a € A such that a € Q—B. Then by 
Lemma 1.6.5 (1) we know that a < b for all b € B. By Part (b) of the definition 
of Dedekind cuts it follows that b € A for all b € B. Hence B C A. Because we are 
assuming that B 4 A, then B G A. The second case is that there is some d € B such 
that d € Q—A, and a similar argument shows that A g B; we omit the details. 


Lemma 1.6.7. Let A be a non-empty family of subsets of Q. Suppose that X is a 
Dedekind cut for all X € A. If Uye,X AQ, then Uxe4X is a Dedekind cut. 


Proof. Let B= Uxe,X. Assume that B 4 Q. We will show that B satisfies the three 
parts of the definition of Dedekind cuts. 


(a) We know that X # 0 for all X € A, so B 4 O. By hypothesis we know that B £ Q. 


(b) Let b € B, and let y € Q. Suppose that y > b. We know that b € X for some X € A. 
By Part (b) of the definition of Dedekind cuts applied to X, we see that y € X. Hence 
yEB. 


(c) Let c € B. Then c € D for some D € A. By Part (c) of the definition of Dedekind 
cuts there is some z € D such that z < c. Hence y € B. 


The following lemma about Dedekind cuts is somewhat technical, and has a 
slightly tedious proof, but it will be needed to define addition, multiplication, negation 
and multiplicative inverse for the real numbers in Section 1.7. 


Lemma 1.6.8. Let A,B C Q be Dedekind cuts. 


1. The set 
{r € Q| r=a+b for some a € A and b € B} 


is a Dedekind cut. 
2. The set 
{r€Q|-r<c for some c € Q—A} 


is a Dedekind cut. 
3. Suppose that0 © Q—A and 0 € Q-B. The set 


{r € Q| r=ab for some a € A and b € B} 


is a Dedekind cut. 
4. Suppose that there is some q € Q—A such that q > 0. The set 


1 
{rE Q|r>O0and — <c for some cE Q—A} 
r 


is a Dedekind cut. 
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Proof. We will prove Parts (1), (2) and (4), leaving the remaining part to the reader 
in Exercise 1.6.3. 


(1) Let 
M = {re Q|r=a+tb for some a € A and b € B}. 


We will show that M satisfies the three parts of the definition of Dedekind cuts. 


(a) We know that A #0 andA 4 Q, and BAO and BF Q. Let x EA, let pe Q—-A, 
let y € Band let g © Q—B. Thenx+y €M, soM £9. We know by Lemma 1.6.5 (1) 
that p <a for alla € A and g < b for all b € B. It follows that p+ q < a+b for all 
a€Aandb eB. Hence p+q € Q—M, andsoM 4 Q. 


(b) Let c € M, and let y € Q. Suppose that y > c. We know that c = a+b for some 
a€Aandb€ B. Then y= [a+(y—c)|+b. Because y > c, then a+ (y—c) >a, and 
hence by Part (b) of the definition of Dedekind cuts we see that a+ (y—c) € A. It 
follows that y € M. 


(c) Let d € M. We know that d= s5+t for some s € A andt € B. Applying Part (c) of 
the definition of Dedekind cuts to A, we see that there is some g € A such that g <s. 
Then g+t EM, andg+t<s+t=d. 


(2) Let 
N= {ré€Q|-r <c for some c € Q—A}. 


We will show that N satisfies the three parts of the definition of Dedekind cuts. 


(a) We know that A £0 and A # Q. Let b € Q—A. By Lemma 1.6.5 (2) we deduce that 
b—1 € Q-A. Then —(b—1) EN. Hence N £0. Next, let a € A. Then —(—a) ¢ Q—A, 
and therefore by Lemma 1.6.5 (2) we know that —(—a) ¢ g for all g € Q—A. Hence 
—a€ Q-N, and soN 4 Q. 


(b) Let d EN, and let y € Q. Suppose that y > d. It follows that —d > —y. By the 
definition of N, we know that —d < c for some c € Q—A. Then —y < c, and therefore 


yeEN. 
(c) Let e € N. Then —e < c for some c € Q—A. Hence e > —c. Let s = ae It 
follows from Exercise 1.5.7 (2) that s € Q, and —c < s < e. Hence —s < c. It follows 


that s € N. 
(4) Because g € Q—A and q > 0, it follows from Lemma 1.6.5 (1) thatO <q <a 
for alla € A. Let 


1 
R= {ré€Q|r>Oand — <c for some c € Q—A}. 
r 


We will show that R satisfies the three parts of the definition of Dedekind cuts. 


(a) Clearly 0 ¢ R, so R # Q. By Exercise 1.5.7 (2) we see that 0 < 4 < q. Then 
2 (Gq) € R, and hence RF 9. 


. 
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(b) Let w € R, and let y € Q. Suppose that y > w. We know w > 0, and hence y > 0. 
It follows that 1 < 1. By the definition of R, we know that + <c for some c € Q—A. 


Then - <6; and hence yER. 


(c) Let m € R. Therefore m > 0, and | € Q—A, and iL <c for some c € Q-A. 


fl m 
m+. 


Hence 0 < i <m. Let t = —;“. It follows from Exercise 1.5.7 (2) that t € Q and 
0< <t<m. Therefore 0 < 4 <c, and hencet € R. 


Our final lemma of this section might appear somewhat unmotivated, but it will 
be used in the proof of Theorem 1.7.6, which states that the real numbers satisfy some 
basic algebraic properties. The proof of the following lemma is not lengthy, but it 
requires a powerful tool, namely, the Well-Ordering Principle (Theorem 1.2.10 or 
Axiom 1.4.4). 


Lemma 1.6.9. Let A C Q be a Dedekind cut. Let y € Q. 


1. Suppose that y > 0. Then there are u € A and v € Q—A such that y = u—v, 
and v < e for some e € Q—A. 

2. Suppose that y > 1, and that there is some q € Q—A such that q > 0. Then 
there arer © Aand s € Q—A such that s > 0, and y > £, and s < g for some 
geEQ-a. 


Proof. 


(1) We follow [Rud53]. Because A 4 @ and A # Q there are w € A and z € Q—A. 
By Lemma 1.6.5 (1) we know that z < w. Because w — z > 0 and y > 0, it follows from 
Exercise 1.5.9 (1) that there is some n € N such that w—z < ny. Hence w+n(—y) <z. 
By Lemma 1.6.5 (2) we deduce that w+n(—y) € Q—A. 

Let 


G= {pEN|w+p(-y) € Q—-A}. 


We know that n € G, and so G 4 @. By the Well-Ordering Principle (Theorem 1.2.10 
or Axiom 1.4.4), we know that there is some m € G such that m < g for all g € G. It 
follows that w+ m(—y) € Q—A and that w+ (m—1)(—y) € A. 

There are now two cases. First, suppose that there is some e € Q—A such that 
w+m(—y) < e. We then let v= w+m(—y) and u = w+ (m— 1)(—y). Clearly y = 
u—v, and v < e. Second, suppose that w + m(—y) > q for all g € Q—A. Because 
w+ (m— 5) (—y) > w+m(—y) then w+ (m—5)(—y) € A. Because w+m(—y) € 
Q-—A, then Lemma 1.6.5 (2) implies that w + (m+ 4)(—y) € Q—A. We let u = 
w+(m—4)(—y), and v=w-+ (m+ 4)(—y) ande =w+m(—y). Then y=u—v, and 
v <e, where e € Q—A. 


(2) We follow [Men73] and [Bur67] in part. Because y > 1 and g > 0, then 
(y—1)$ > 0. By Part (1) of this lemma there are u € A and v € Q—A such that 
(y—1)4 =u—vand v <e for some e € Q—A. 

There are now two cases. First, suppose that v > 4. Because y — 1 > 0, it follows 
that (y —1)v > (y—1)4 and hence (y — 1)v > u—v. Therefore yv > u. Because v > 4 
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we know that v > 0, and hence we conclude that y > ae We then let r=u ands =v, 
and evidently s > 0, and y > { and s <e. 

Second, suppose that v < $. We let r= u+ (24 —v)ands= 3a Because q > 0 
then s > 0. Because 34 —v> 0, we see that r > u, and hence r € A by Part (b) of the 
definition of Dedekind cuts. Also, because s < g and g € Q—A, then s € Q—A by 


Lemma 1.6.5 (2). Evidently r—s = u—v = (y—1)4. Because g > 0, we deduce that 


(y—1) 34 >r-—s, and hence (y—1)s > r—s, which yields ys > r. Because s > 0, we 
conclude that y > =. 


Reflections 


Proving a theorem is fun when either the theorem itself is interesting, or when the 
proof of the theorem is clever or insightful. Tedious proofs of dry technical results 
that are of interest only for proving something else yet to come are not particularly 
appealing, and, unfortunately, that is the nature of much of the present section, which 
consists of the preliminaries needed for the actual construction of the real numbers 
seen in the next section. Hence, the reader could not be faulted for skipping some of 
the details of some of the proofs in the present section upon first reading. However, for 
the sake of having a complete treatment of the real numbers, detailed proofs have been 
included, and are available for whoever wants to read them. Treatments of Dedekind 
cuts in some other books might appear to be more brief than our treatment, but that is 
only because we have put in details that are often omitted. 


Exercises 


Exercise 1.6.1. [Used in Exercise 1.6.5.] Let A,B C Q be Dedekind cuts. Suppose 
that A g B. Prove that B— A has more than one element. If you are familiar with the 
cardinality of sets, prove that B — A is countably infinite. 


Exercise 1.6.2. [Used in Example 1.6.3.] Let T be the set defined in Equation 1.6.1. 


(1) Prove that T is a Dedekind cut. 
(2) Prove that if T = D, for some r € Q, then r? = 2. 
[Use Exercise 1.5.6, Exercise 1.5.7 and Exercise 1.5.9 (3).] 


Exercise 1.6.3. [Used in Lemma 1.6.8.] Prove Lemma 1.6.8 (3). 
Exercise 1.6.4. [Used in Lemma 1.7.4.] Let A C Q be a Dedekind cut, and let r € Q. 


(1) Prove that A g D, if and only if there is some g € Q—A such that r < q. 
(2) Prove that A C D, if and only if r € Q—A if and only if r <a foralla€A. 


Exercise 1.6.5. What we call a Dedekind cut is often called an “upper cut,’ to 
differentiate it from the analogous “lower cut.” Both types of cuts are equally valid, 
and are mirror images of each other, though upper cuts are slightly simpler to use 
because the product of positive numbers is positive, whereas the product of negative 
numbers is not negative. 
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(1) Write a precise definition of lower cuts, modeled upon Definition 1.6.1. 

(2) Let A C Q be a Dedekind cut. Find an example to show that Q —A is not 
necessarily a lower cut. 

(3) Let A C Q be a Dedekind cut. Prove that if Q—A is not a lower cut, then 
Part (a) of the definition of lower cuts does not hold. Deduce that there is some 
m € Q—A such that x < m for all x EC Q—A. 

(4) Let A C Q be a Dedekind cut. Suppose that Q —A is not a lower cut. Prove 
that there is a unique element k € Q—A such that Q— (AU {k}) is a lower 
cut. 

(5) Let A C Q be a Dedekind cut. Suppose that Q —A is not a lower cut. Let k be 
as in Part (4) of this lemma. Prove that k < x for all x € AU {k}. 

(6) Let D" denote the set of all Dedekind cuts of Q, and let D! denote the set 
of all lower cuts of Q. Prove that there is a bijective function @: DY — 
1! such that A C B implies @(A) D @(B) for all A,B € D". Because lower 
cuts are completely analogous to Dedekind cuts, you may assume that the 
analog of everything that has been previously proved about Dedekind cuts 
and lower cuts holds with the roles of Dedekind cuts and lower cuts reversed. 

[Use Exercise 1.6.1.] 


Exercise 1.6.6. [Used in Exercise 1.7.8.] In Definition 1.6.1, Dedekind cuts were 
defined as subsets of the set Q. However, an examination of this definition reveals 
that it does not make use of the full features of Q, but only the order relation < on Q. 
Hence, it is possible to define Dedekind cuts on sets equipped with only order relations, 
but not necessarily with binary operations such as addition and multiplication. 

Let S be a non-empty set, and let < be a relation on S. The relation < is an 
order relation if it satisfies the Trichotomy Law and the Transitive Law, as stated, 
for example, in Theorem 1.5.5 (10) (11); the set S is an ordered set if < is an order 
relation. For example, the natural numbers, the integers and the rational numbers 
are all ordered sets. Dedekind cuts can be defined for any ordered set exactly as in 
Definition 1.6.1. 


(1) Given an example of an ordered set for which the analog of Lemma 1.6.2 does 
not hold. 

(2) Find criteria on an ordered set that would guarantee that the analog of Lem- 
ma 1.6.2 holds. The criteria must be defined strictly in terms of the order 
relation. 

(3) Verify that the analog of Lemma 1.6.7 holds for arbitrary ordered sets. 
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Having done the hard work regarding Dedekind cuts in Section 1.6, we are now ready 
to use them to define the set of real numbers. We want the set of real numbers to 
contain a copy of the set of rational numbers, and it should also have numbers such as 
J2, where the set of rational numbers has “gaps.” If we consider the fact that each 
rational number r corresponds to the rational cut D,, and that there are also irrational 
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cuts such as the one defined in Equation 1.6.1 (which appears as if it wants to be the 
set of all rational numbers greater than a number that is missing from the rational 
numbers), then we might guess that each Dedekind cut corresponds to a real number, 
and each real number corresponds to a Dedekind cut. Such a guess would be correct. 
Given that we have not yet seen a definition of the real numbers, we cannot prove 
such an intuitive correspondence between the real numbers and Dedekind cuts of 
rational numbers, but we can take it as inspiration for the following definition. 


Definition 1.7.1. The set of real numbers, denoted R, is defined by 
R= {A C Q|A is a Dedekind cut}. A 


We note that in both the definition of the integers in terms of the natural numbers 
(in Definition 1.3.3), and the definition of the rational numbers in terms of the integers 
(in Definition 1.5.3), the constructions used equivalence classes. In Definition 1.7.1, 
by contrast, we have the advantage of not having to use an equivalence relation. 

We now turn to the definitions of addition, multiplication, negation, multiplicative 
inverse, less than and less than or equal to for the real numbers. We start with the 
last two of these, which are the simplest, and which are needed in the definitions of 
multiplication and multiplicative inverse. 

Given that Dedekind cuts are sets of rational numbers, we define the relation less 
than on the real numbers in terms of the relation “subset” on sets of rational numbers. 


Definition 1.7.2. The relation < on R is defined by A < B if and only if A 2 B, for 
all A,B € R. The relation < on R is defined by A < B if and only if A > B, for all 
A,BER. A 


The following definition of addition and negation for real numbers makes sense 
because of Lemma 1.6.8 (1) (2). 


Definition 1.7.3. The binary operation + on R is defined by 
A+B={reEQ|r=a+b for some a € A and b € B} 
for all A,B € R. The unary operation — on R is defined by 
—-A={reQ|-r<c for some c € Q—A} 


forallA Ec R. A 


The definition of multiplication and multiplicative inverse for real numbers is a 
bit more complicated than the definition of addition and negation, because we will 
need various cases. We start with the following lemma. 


Lemma 1.7.4. Let A € R, and letrEeQ 


1. A> D, if and only if there is some q © Q—A such that q > r. 
2. A> D, ifand only ifr € Q—A if and only ifa>r for allaca. 
3. IfA < Dg then —A > Do. 
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Proof. 
(1) & (2) These are just restatements of Exercise 1.6.4. 


(3) Suppose that A < Do. Then A 2 Do. Because Do = {x € Q | x > O}, it follows 
that there is some g € A such that g < 0. By Part (b) of the definition of Dedekind cuts, 
we deduce that 0 € A. Hence 0 ¢ Q—A, and therefore —0 ¢ Q—A, which implies 
that 0 ¢ —A, and hence 0 € Q— (—A). By Part (2) of this exercise we deduce that 
—A > Do. 


The following definition of multiplication and multiplicative inverse makes sense 
because of Lemma 1.6.8 (3) (4) and Lemma 1.7.4. 


Definition 1.7.5. The binary operation - on R is defined by 


{r€Q|r=abforsomeacAandbe B}, ifA >Do and B>Do 


Apu dA) Bl, if A < Do and B> Do 
~ -) -[A-(-B)], if A> Do and B< Do 
(=A)<(=8), if A < Do and B < Dp. 


The unary operation ~! on R — {Do} is defined by 


gee {re€Q|r>Oand 4 <c forsomece Q—A}, ifA>Do 
a =i Ai", ifA < Do. A 


Having now defined the basic operations and relations on the real numbers, we 
are ready to prove the most fundamental algebraic properties of these numbers. The 
proof of the following theorem is lengthy, and tedious in parts, and the reader would 
not be faulted for skipping a few of the details upon first reading. As is usual, we will 
write “AB” instead of “A - B,” except in cases of potential ambiguity, or for ease of 
reading. We will write A > B to mean the same thing as B < A. 


Theorem 1.7.6. Let A,B,C € R. 


. (A+B)+C=A+(B+C) (Associative Law for Addition). 
A+B=B+A (Commutative Law for Addition). 

A+Dg =A __ (Identity Law for Addition). 

A+(-—A)=Do __ (Inverses Law for Addition). 

. (AB)C=A(BC) (Associative Law for Multiplication). 

AB=BA_ (Commutative Law for Multiplication). 

~ A-D, =A __ (Identity Law for Multiplication). 

. IfA# Do, then AA~! =D, (Inverses Law for Multiplication). 

. A(B+C)=AB+AC (Distributive Law). 

10. Precisely one of A< BorA=BorA>Bholds  (Trichotomy Law). 
Il. IfA<BandB<C, thenA<C_ (Transitive Law). 

12. IfA<BthenA+C<B+C _ (Addition Law for Order). 

13. IfA <BandC > Do, thenAC < BC (Multiplication Law for Order). 


SC eNANAWNe 
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14. Do <D, — (Non-Triviality). 


Proof. We will prove the parts of this theorem not in the order in which they are 
stated (which is a standard way of writing the theorem), but rather in an order chosen 
so that the proof of each part relies only upon the previously proved parts. As was 
the case in Section 1.6, to avoid making the proof any longer than necessary, we 
will not cite references to the standard properties of the rational numbers proved in 
Section 1.5. 


(14) Left to the reader in Exercise 1.7.3. 


(1) Using the definition of addition for R we see that 


(A+B)+C= {re Q|r=(at+b)+c for some a € A andb € Bandc € C} 
= {r€Q|r=a+(b+c) for some a € A and be Bandc EC} 
=A+(B+C). 


(2) The proof of this part of the theorem is very similar to the proof of Part (1), 
and we omit the details. 


(3) Using the definition of addition for R we see that 
A+Do = {r€Q|r=a+b for some a € A and b € Do}. 


Let a € A. Then by Part (c) of the definition of Dedekind cuts, there is some c € A such 
that c <a. Then a—c > 0, and hence a—c € Do. Therefore a= c+(a—c) €A+Do. 
It follows that A CA+Dpo. 

Let d€ A+Do. Then d =s+t for some s € A and t € Do. By the definition of 
Do, we know that t > 0. Hence d > s. It follows from Part (b) of the definition of 
Dedekind cuts that d € A. It follows that A-++ Dp C A, and hence A+ Do =A. 


(4) Using the definition of addition and negation for R we see that 
A+(-A) = {r€ Q|r=a+b for some a € A and b € —A}. 


Let x € A+(—A). Then x = 5+t for some s € A and t € —A. It follows from the 
definition of —A that —t < c for some c € Q—A. By Lemma 1.6.5 (2) we know that 
—t € Q-—A, and hence by Lemma 1.6.5 (1) we see that —t < s. Hence s+t > 0, and 
therefore x > 0, which implies that x € Do. We deduce that A+ (—A) C Do. 

Now let y € Do. Then y > 0. It follows from Lemma 1.6.9 (1) that there are 
u€A and v € Q—A such that y = u—v, and v < e for some e € Q —A. It follows 
that y = u+(—v), and —(—v) < e. Therefore —v € —A, and hence y € A+ (—A). It 
follows that Dp C A+ (—A), and therefore A + (—A) = Dp. 


(10) This part of the theorem is just a restatement of Lemma 1.6.6. 


(11) This part of the theorem is just a restatement of a standard fact about subsets 
of sets; see [Blol10, Lemma 3.2.4]. 


1.7 Constructing the Real Numbers 45 


(12) Suppose that A < B. Hence A 2 B. Letx € B+C. Then x = u+ v for some 
u€Bandv ec. Thenu€A, sox €A+C. It follows that A+C > B+C. There 
is some p € A—B. Then by Lemma 1.6.5 (1) we know that p < b for all b € B. 
Let c € C. Then p+c<b+c for all b € B. It follows from Lemma 1.6.5 (1) that 
pt+c€Q-(B+C). Because p+c€A+C, we deduce that A+C 2 B+C. 


(5) Left to the reader in Exercise 1.7.5. 


(6) By Part (10) of this theorem, we know that either A > Dp or A < Do, and 
similarly for B. There are now four cases. 
First, suppose that A > Do and B > Do. Then 


AB = {r € Q|r=ab for some a € A and b € B} 
= {r€Q|r=ba for some a € A and b € B} = BA. 


Second, suppose that A > Do and B < Do. By Exercise 1.7.4 (1) we know that 
—B > Do. The definition of multiplication of Dedekind cuts, together with the case 
we have already proved, imply that AB = —[A(—B)] = —[(—B)A] = BA. 

The other two cases are where A < Do and B > Do, and where A < Do and B < Do, 
are very similar to the case just proved, and we omit the details. 


(7) Left to the reader in Exercise 1.7.5. 


(8) Suppose that A ~ Do. By Part (10) of this theorem we know that either A > Do 
or A < Dp. First, suppose that A > Do. By Exercise 1.7.2 (2) we know that A~! > Do. 
It then follows from the definition of multiplication of Dedekind cuts that 


AA! ={ré€Q|r=ab for someacA andb€A!}. 


Let x € AA~!. Then x = uv for some u € A andv €A!. By Lemma 1.7.4 (2) we 
know that u > 0, and by the definition of A! when A > Do we know that v > 0, and 
that 7 <h for some h € Q—A. By Lemma 1.6.5 (2) we see that 1 € Q-—A, and hence 
by Lemma 1.6.5 (1) we know that 7 <u. Hence | < uv, and therefore x > 1, which 
means that x € D). It follows that AA~! C Dy. 

Now let y € D;. Then y > 1. Because A > Dg, we know by Lemma 1.7.4 (1) 
that there is some g € Q—A such that g > 0. It then follows from Lemma 1.6.9 (2) 
that there are r € A and s € Q—A such that s > 0, and y > = and s < k for some 
k € Q—A. We know that 4 > 0, and that a <k, and we deduce that i €A~!. Hence 


: =r- i € AA~!. Because y> a it follows from Part (b) of the definition of Dedekind 
cuts that y € AA7!. Hence D; C AA7!, and therefore AA~! = Dj. 

Next, suppose that A < Do. Then by the definition of multiplicative inverse we 
see that A~! = —(—A)~!. Hence —A~! = —[—(—A)~!], and it follows from Exer- 
cise 1.7.4 (2) that -A~! = (—A)~!. By Exercise 1.7.4 (1) we know that —A > Do, 
and by an argument similar to one used in the case where A > Do, we see that 
(—A)~! > Do. Therefore —A~! > Do, and hence by Exercise 1.7.4 (1) again we 
deduce that A~! = —(—A)~! < Do. We now know by the definition of multiplica- 
tion Dedekind cuts, combined with the previous case, that AA~! = (—A)(—A~!) = 
(-A)(-A)"! =D). 
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(9) We follow [Men73]. There are eight cases. 

First, suppose that A > Dp and B > Do and C > Do. By Lemma 1.7.4 (2), we 
know that every element of each of the sets A, B and C is greater than 0. It follows 
that every element in B + C is greater than 0, and hence by Lemma 1.7.4 (2) we know 
that B+ C > Do. 

Using the definition of AB, when A > Do and B > Do, we see that 


AB = {r € Q|r=ab for some a € A and b € B}. 


It follows that every element of AB is greater than 0. A similar argument shows that 
every element of each of the sets AC, A(B+C) and AB + AC is greater than 0. 

Let x € A(B+C). As just noted, we know x > 0. Also, we know by definition 
that x = aq for some a € A and g € B+C, and that g = b+ c for some b € B and 
c €C. Hence x = a(b+ c) = ab+ac. Because ab € AB and ac € AC, we deduce that 
x € AB+AC. Hence A(B+C) CAB+AC. 

Now let y€ AB+AC. Then y = u+v for some u € AB and v € AC. Hence u = a,b 
and v = agc for some a},a2 € A, and b € Band c €C. If aj = ap, then y=aj(b+c), 
and so y € A(B+C). Now suppose that a; #4 a2. Without loss of generality, assume 
that a, > az. Observe that ma > b, and hence by Part (b) of the definition of Dedekind 


cuts we see that bay € B. Because y = a2 E + c| , we deduce that y € A(B+C). 


Therefore AB-+AC C A(B+C), and hence A(B+C) =AB+AC. 

The remaining cases all make use of the case that we have already proved, together 
with Exercise 1.7.4; for the sake of brevity we will not cite that exercise when we use 
it. 


For our second case, suppose that A > Do and B > Do and C < Do. Then —C > 
Do. There are two subcases, depending upon whether B+ C > Do or B+C < Do. 
First, suppose that B+ C > Do. By the definition of multiplication of Dedekind cuts, 
we know that AC = —[A(—C)], and hence —(AC) = A(—C). By the definition of 
multiplication of Dedekind cuts, the case we have already proved, and Parts (3) 
and (4) of this theorem, we see that AB + AC = A[(—C) + (B+C)] +AC = A(—C) + 
A(B+C) +AC = —(AC) +A(B+C) +AC = A(B+C). Second, suppose that B+C < 
Do. Then —(B+C) > Do. Reasoning similar to the above shows that AB + AC = 
AB+{—(A(—C)]} =AB+ {—(A[B+[—(B+0)]]]} =AB+{—[AB+4[-(B+0)]]} = 
AB +{—(AB]}+{-[A[-(B+0)]]} = -[A[-(B +©)]] =A(B+C). 

Third, suppose that A > Do and B < 0 and C > Do. This case is just like the 
previous case, and we omit the details. 

Fourth, suppose that A > Do and B < Do andC < Do. Then B+C < Do, and —B > 
Do and —C > Do and —(B+C) > Do. Then, using the definition of multiplication 
of Dedekind cuts, we see that A(B + C) = —[A[—(B+C)]] = —{A[(—B) + (—C)]} = 

{(A(-B)] + [A(—C)}} = {-[A(-B)}} + {-[A(—O)}} = AB+AC. 

There are four other cases, which are similar to the cases we have already seen, 

and which are left to the reader in Exercise 1.7.6. 


(13) Suppose that A < B and C > Do. By Parts (4) and (12) of this theorem 
we see that Dp = A+ (—A) < B+ (—A). Hence by Exercise 1.7.2 (1) we see that 
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[B+ (—A)|C > Do. It follows from Parts (12) and (3) of this theorem that AC + 
[B+ (—A)]C > AC + Do = AC. By Parts (6) and (9) of this theorem we deduce that 
{A+ [B+ (—A)]}C > AC, which by Exercise 1.7.4 (5) implies that BC > AC. 


The observant reader will have noticed that the properties of the real numbers 
listed in Theorem 1.7.6 are identical to the properties of the rational numbers listed in 
Theorem 1.5.5. Hence, as mentioned in Section 1.5, the set of real numbers is called 
an “ordered field.” 

Because the rational numbers and the real numbers share the same algebraic 
properties listed in Theorem 1.5.5 and Theorem 1.7.6, these properties alone do not 
suffice to distinguish these two sets of numbers. We now turn to a property of the 
real numbers that is not satisfied by the rational numbers. This property, which is 
called the Least Upper Bound Property, and is stated in Theorem 1.7.9 below, in 
fact characterizes the real numbers when taken together with the algebraic properties 
stated in Theorem 1.7.6. That is, not only do the real numbers satisfy Theorem 1.7.6 
and Theorem 1.7.9, but essentially nothing else satisfies these two theorems, as will 
be proved in Section 2.7. 

In order to state the Least Upper Bound Property, we need the following definition. 


Definition 1.7.7. Let A C R be a set. 


1. The set A is bounded above if there is some M € R such that X < M for all 
X €A. The number M is called an upper bound of A. 

2. The set A is bounded below if there is some P € R such that X > P for all 
X €A. The number P is called a lower bound of A. 

3. The set A is bounded if it is bounded above and bounded below. 

4. Let M € R. The number M is a least upper bound (also called a supremum) 
of A if M is an upper bound of A, and if M < T for all upper bounds T of A. 

5. Let P € R. The number P is a greatest lower bound (also called an infimum) 
of A if P is a lower bound of A, and if P > V for all lower bounds V of A. A 


As the reader is asked to verify in Exercise 2.3.11, a subset A C R is bounded if 
and only if there is some M € R such that |X| < M for all X € A; it is always possible 
to choose M so that M > 0. The proof of this fact has to wait until Section 2.3 because 
that is where we define absolute value. 

Further discussion of upper bounds and lower bounds, and least upper bounds and 
greatest lower bounds, including examples, will be given in Section 2.6. 

Before stating and proving the Least Upper Bound Property, we first state and 
prove the “mirror image” of this property, namely, the Greatest Lower Bound Property. 
These two properties are completely equivalent, in that each one implies the other. The 
Least Upper Bound Property is more commonly used, but the definition of Dedekind 
cuts makes it easier to prove the Greatest Lower Bound Property first. 


Theorem 1.7.8 (Greatest Lower Bound Property). Let A C R be a set. If A is 


non-empty and bounded below, then A has a greatest lower bound. 


Proof. Suppose that A is non-empty and bounded below. Let L = Uye, X, where 
the union makes sense because we can think of the elements of IR as Dedekind cuts, 
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which are subsets of Q. We will show that L = glbA. Let B € R be a lower bound of A. 
Because B < X for all X € A, then B > X for all X € A. It follows that L= Uye,X CB. 
Because B # Q by Part (a) of the definition of Dedekind cuts, we see that L 4 Q. We 
then use Lemma 1.6.7 to deduce that L is a Dedekind cut. 

Clearly X C L for all X € A, and hence L < X for all X € A. It follows that L 
is a lower bound of A. Now let C € R be a lower bound of A. Then C < X for all 
X €A, and hence C > X for all X € A. By a standard property of unions of sets (see 
[Blol0, Theorem 3.4.5]), we deduce that C > Uye,X = L. Hence C < L, and we 
conclude that L is the greatest lower bound of A. 


We now see that the Least Upper Bound Property is a straightforward consequence 
of the Greatest Lower Bound Property (Theorem 1.7.8). Observe that the proof of the 
following theorem uses nothing other than Definition 1.7.7 and the statement of the 
Greatest Lower Bound Property; in particular, the proof does not make any mention 
of Dedekind cuts. 


Theorem 1.7.9 (Least Upper Bound Property). Let A C R be a set. If A is non- 
empty and bounded above, then A has a least upper bound. 


Proof. Suppose that A is non-empty and bounded above. Let 
U = {X €R|X isan upper bound of A}. 


Then U # @, because A is bounded above. Let B € A. Then B < X for all X € U, and 
hence B is a lower bound of U. Because U is non-empty and bounded below, we can 
apply the Greatest Lower Bound Property (Theorem 1.7.8) to U to deduce that U has 
a greatest lower bound, say L. 

If C €A, then C < X for all X € U. Therefore every element of A is a lower bound 
of U. Because L is the greatest lower bound of U, it follows that C < L for all C € A. 
Hence L is an upper bound of A. Because L is a lower bound of U, then L < X for 
every X € U. We deduce that L is a least upper bound of A. 


We now have a situation very analogous to when we constructed the integers from 
the natural numbers (in Section 1.3), and the rational numbers from the integers (in 
Section 1.5), in that we have two sets of numbers, namely, the set of real numbers and 
the set of rational numbers, which, while informally one set is viewed as containing 
the other, formally these two sets as constructed are entirely disjoint. As expected, 
however, we can find a copy of the set of rational numbers inside the set of real 
numbers by identifying each rational number r with the real number D,. We will see 
in the following theorem that this identification preserves the numbers 0 and 1, the 
binary operations addition and multiplication and the relation less than. 


Theorem 1.7.10. Let i: Q — R be defined by i(r) = D, for allr € R. 
1. The function i: Q — R is injective. 
2. i(0) = Do and i(1) = Dj. 
3. Let r,s € Q. Then 
a. i(r+s) =i(r) +i(s); 
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i(—r) = —i(r); 
i(rs) = i(r)i(s); 
ifr £0 then i(r—!) = [i(r)]~!; 
r<-s ifand only if i(r) < i(s). 


sas 


Proof. Left to the reader in Exercise 1.7.7. 


Again analogously to our previous constructions, we see from Theorem 1.7.10 
that even though technically the set of rational numbers and the set of real numbers are 
entirely disjoint, in fact from the point of view of addition, multiplication, negation, 
multiplicative inverse and the relation less than, the rational numbers can be identified 
with a subset of the real numbers. We therefore dispense with the set of rational 
numbers as a separate entity (except when we need it in proofs), because we have a 
copy of the rational numbers inside the real numbers that works just as well as the 
original. We will therefore also dispense with the notation Do and Dj, and simply 
write 0 and | instead. 

Moreover, not only will we dispense with the Dedekind cut notation from now 
on, we will not need to use the concept of Dedekind cuts at all after this section. All 
further properties of the real numbers will now be derived from the statements of 
Theorem 1.7.6 and Theorem 1.7.9, without any reference to the fact that we proved 
these two theorems using Dedekind cuts. Our construction of the real numbers proves 
that there exists a set with the properties that we would expect of the real numbers, 
but ultimately, it is the properties of the real numbers, not how they are constructed, 
that matter. 


Reflections 


As mentioned in Exercise 1.6.5, what we call a Dedekind cut is often called an 
“upper cut,” to differentiate it from the analogous “lower cut.” Both types of cuts, 
which are mirror images of each other, can be used to construct the real numbers. 
We have chosen to work with upper cuts because they are technically slightly easier 
to use, due to the fact that the product of positive numbers is positive, whereas the 
product of negative numbers is not negative. On the other hand, there is an advantage 
to using lower cuts, which is that they allow for a direct proof of the Least Upper 
Bound Property, as opposed to upper cuts, which lead naturally to the Greatest Lower 
Bound Property, and only from there do we arrive at the Least Upper Bound Property. 
Of course, the Greatest Lower Bound Property is just as good as the Least Upper 
Bound Property, and it is possible to use the former instead of the latter, but we follow 
the standard approach used today and focus on the Least Upper Bound Property as 
the fundamental property. 

Our construction of the real numbers follows the traditional route of starting with 
the natural numbers, constructing the integers from the natural numbers, constructing 
the rational numbers from the integers and finally constructing the real numbers from 
the rational numbers. It turns out that there is a more efficient construction of the real 
numbers directly from the integers; see [A’C], [Str] or [DKO*] for details. We have 
chosen to stay with the traditional route, however, because the steps used, while less 
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efficient, seem more natural, and because the rational numbers are important in their 
own right, and deserve to have their properties discussed. 

We have now come to the end of our construction of the number systems. It 
is possible to construct the complex numbers from the real numbers, but we will 
not need the complex numbers in a book on real analysis. Interestingly, although 
the complex numbers have many fascinating and important properties, and complex 
analysis is a subject well worth studying, the construction of the complex numbers 
from the real numbers is quite simple in comparison to the constructions we have 
seen in this chapter. 


Exercises 


Exercise 1.7.1. [Used in Exercise 1.7.7.] Let r € Q. 


(1) Prove that D_, = —D,, using only Definition 1.6.4 and Definition 1.7.3. 
(2) Prove that D,-1 = [D,]~!, using only Definition 1.7.5 and Definition 1.7.3. 


Exercise 1.7.2. [Used in Theorem 1.7.6.] Let A,B € R. Suppose that A > Do and 
B > Do. For this exercise, you may use only results prior to Theorem 1.7.6. 


(1) Prove that AB > Do. 
(2) Prove that A~! > Do. 


Exercise 1.7.3. [Used in Theorem 1.7.6.] Prove Theorem 1.7.6 (14). For this exercise, 
you may use only results prior to Theorem 1.7.6. [Use Exercise 1.5.6 (1).] 


Exercise 1.7.4. [Used in Theorem 1.7.6, Exercise 1.7.5 and Exercise 1.7.6.] For this 
exercise, use only the properties of real numbers stated in Theorem 1.7.6 (1) (2) (3) 
(4) (10) (11) (12) (14); it is not necessary to use the definition of real numbers as 
Dedekind cuts. Let A,B € R. 


(1) Prove that A > Do if and only if —A < Do, and that A < Do if and only if 
—A > Do. 

(2) Prove that —(—A) =A. 

(3) Prove that —(A +B) = (—A)+(-—B). 

(4) Prove that if A > Dp and B > Do, then A+B > Do, and that if A < Do and 
B< Do, thenA+B< Do 

(5) Prove that A = (—B)+ (A+B) =B+[A+(—B)| and —A = B+[-(B+A)]. 


Exercise 1.7.5. [Used in Theorem 1.7.6.] Prove Theorem 1.7.6 (5) (7). For this exer- 
cise, you may use only Parts (1), (2), (3), (4), (10), (11), (12) and (14) of the theorem, 
and anything prior to the theorem. [Use Exercise 1.7.4.] 


Exercise 1.7.6. [Used in Theorem 1.7.6.] Prove the remaining four cases in the proof 
of Theorem 1.7.6 (9). [Use Exercise 1.7.4.] 


Exercise 1.7.7. [Used in Theorem 1.7.10.] Prove Theorem 1.7.10. 
[Use Exercise 1.7.1.] 
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Exercise 1.7.8. This exercise makes use of Exercise 1.6.6. Let S be a non-empty 
ordered set. The Dedekind set of S, denoted 8, is defined by 


S? ={A CS | A is a Dedekind cut}. 


For example, we know by definition that Q? = R. The order relation < on S? is 
defined analogously to Definition 1.7.2. 


(1) Find an example of an ordered set T for which T? = 0. It is sufficient to state 
informally the reason why your example works. 

(2) Find an example of an ordered set U for which U has exactly one element. It 
is sufficient to state informally the reason why your example works. 

(3) Verify that S? satisfies the Least Upper Bound Property. 

(4) What can you say about R”? It is sufficient to answer this question informally. 


1.8 Historical Remarks 


When the use of numbers in the ancient world is discussed, it is common to consider 
the different methods by which numbers were written in different cultures, the most 
familiar of such methods being Roman numerals. From the perspective of real analysis, 
however, our interest in the history of numbers is concerned very little with the way 
in which numbers were written in various periods of history, and instead with what 
the concept of “number,” however written, was understood to mean. In real analysis 
we make extensive use of the properties of the real numbers, and hence we need 
to know what these numbers are, but we make no use of how numbers are written. 
We will discuss the decimal place-value system in detail in Section 2.8, and briefly 
here, because this notation was historically important in the development of calculus 
because it facilitated numerical calculation. Nonetheless, from the perspective of real 
analysis, the ability to write numbers in the decimal place-value system is a nice 
application of the axioms for the real numbers, but it is not a tool used in proofs. 
(The decimal place-value system is famously used in Cantor’s diagonal argument 
that shows that the set of real numbers is uncountable, but we will not use that fact. 
Moreover, as a nice application of sequences, we will provide a different proof, also 
due to Cantor, of the uncountability of the real numbers that does not involve decimals; 
see Theorem 8.4.8.) 

The real numbers as we now understand them, and as we use them in real analysis, 
consist of a variety of types of numbers, including natural numbers, zero, negative 
integers, rational numbers that are not integers, irrational numbers that are algebraic 
(that is, irrational numbers that are the roots of polynomials with rational coefficients) 
and irrational numbers that are transcendental (that is, numbers that are not algebraic). 
We will not discuss the complex numbers in these historical comments, because we 
do not use them in this text. 


Ancient World 


The first numbers to be used were the natural numbers, that is, the positive whole 
numbers, which were used to count objects. Such numbers arose in many ancient 
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cultures, as did methods for adding and multiplying these numbers; the particular 
algorithms for addition and multiplication used by various cultures are not of interest 
to us here. The need for fractions also arose in many ancient cultures. The rest of the 
numbers, however, were slower to be recognized. 

In the ancient Middle East and Europe, the word “number” meant positive integers 
and their ratios. The number | was not always viewed as a number; Euclid (c. 325— 
c.265 BCE), for example, held that view. 

Negative numbers and zero were understood as numbers at some point in ancient 
China and India; it seems that negative numbers were used before zero was used. 
In ancient China calculations were done with rods and the abacus, and zero is not 
needed in that case, because it suffices to have the absence of a rod or bead. Positive 
and negative numbers, on the other hand, were represented by rods of different colors. 
Rules for addition and subtraction of signed numbers were known in ancient China 
in the 2nd century; rules for multiplication and division of signed numbers were 
available only in 1303 in Suanshu Chimeng. 

Ancient texts in a variety of cultures had approximate rational values for numbers 
such as J2 and z that we now know are irrational. However, whereas it was known 
that such approximations were not the exact value of the number, there did not appear 
to be an awareness in ancient China, Mesopotamia, Egypt or India that it was not 
possible to achieve an exact rational value; that is, these cultures did not recognize 
the concept of irrational numbers. 

The idea of incommensurable pairs of lengths of line segments was discovered in 
ancient Greece; ratios of such lengths represent what we call irrational numbers. The 
fact that \/2 is irrational, expressed geometrically in terms of the incommensurability 
of the diagonal and sides of a square, is often attributed to Pythagoras of Samos 
(c.569-c.475 BCE) or his followers, and was presumably known by the time of 
Theodorus of Cyrene (465-398 BCE), who understood that 3, 5, 6, 7, 8, 10, 11, 
12, 13, 14, 15 and 17 do not have rational square roots, a fact referred to by Plato 
(427-347 BCE). The irrationality of such numbers is proved in Book X of Euclid’s 
Elements, which is attributed (at least in part) to Theaetetus of Athens (c. 417-c. 369 
BCE), who was a student of Theodorus. 

Even though the ancient Greeks were aware of incommensurable lengths of line 
segments, this discovery did not lead to the adoption of irrational numbers as numbers, 
perhaps due to the ancient Greek separation of arithmetic from geometry. Aristotle 
(384-322 BCE), in Book VI of the Physics, emphasized the distinction between “num- 
ber,’ which was discrete and which had an indivisible unit, and “magnitude,” which 
was “continuous” and which had no indivisibles (hence, for example, line segments 
were not made up of points). Magnitudes could be lengths of line segments, areas of 
regions of the plane, time, and other physically meaningful objects. Aristotle used 
the distinction between number and magnitude as part of his attempt to refute Zeno’s 
paradoxes. Aristotle was very influential in medieval Europe, and the distinction 
between number and magnitude is one of the Aristotelian ideas that needed to be 
overcome as part of the development of calculus, and more generally modern science. 

In Book VII of Euclid’s Elements there is a theory of ratios of numbers, which 
corresponds to what we call fractions. In Book V of the Elements there is a theory 
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of proportions, that is, ratios of magnitudes. The two magnitudes in a ratio must 
be quantities of the same type, though ratios of different types of quantities can 
be compared. This theory includes definitions of order, addition and multiplication 
for ratios of magnitudes, and the Archimedean Principle is invoked. The theory of 
proportions is very important for a number of theorems in the Elements, for example 
it is needed to state the fact that the areas of two circles have the same proportion 
as the squares of their diameters. In some ways this theory prefigures the idea of 
Dedekind cuts formulated by Dedekind in 1858 (as discussed below). The theory 
of proportions in the Elements is attributed to Eudoxus of Cnidus (408-355 BCE), 
and the Archimedean Principle should perhaps be named after Eudoxus rather than 
Archimedes (287-212 BCE), though this principle is also found in Book I of On the 
Sphere and Cylinder by Archimedes. 


Medieval Period 


The mathematicians in ancient India, who were more interested in algebra and com- 
puting than the ancient Greeks, did not have the philosophical restrictions of the latter 
in regard to numbers. This freedom to develop numbers eventually helped the devel- 
opment of calculus. Negative numbers and zero were conceptualized, and recognized 
as numbers, in ancient India. Brahmagupta (598-670), in the Brahmasphutasiddhanta 
of 628, considered zero to be the number obtained by subtracting a number from 
itself, and gave rules for addition, subtraction, multiplication and division of signed 
numbers and zero, though he did not understand division by zero. In the Lilavati of 
1150, Bhaskara II (1114-1185), also known as Bhaskaracharya, built upon the work 
of Brahmagupta, but recognized the problem of dividing by zero. 

The rise of algebra and analytic geometry helped promote the recognition of 
the real numbers as numbers. The algebra of Abu Ja’far Muhammad ibn Musa Al- 
Khwarizmi (c. 780-c. 850), in Al-kitab al-muhtasar fi hisab al-jabr wal-muqabala of 
around 825, allowed different types of numbers, rational and irrational, to be treated 
in amore unified way than before. The Latin rendering of the title of this work led to 
the word “algebra.” 

The blurring of the distinction between different types of numbers was further 
enhanced by Abu Kamil Shuja ibn Aslam (c. 850-930), whose work on algebra 
influenced Leonardo of Pisa (1170-1250), also known as Fibonacci, who spread 
Arabic algebra in Europe via his book Liber abaci of 1202. Abu Mansur ibn Tahir 
Al-Baghdadi (c. 980-1037) more explicitly broke down the ancient Greek distinction 
between number and magnitude by forming a correspondence between numbers 
and lengths of line segments via multiples of a fixed line segment, where irrational 
numbers correspond to those line segments that are not rational multiples of the fixed 
line segment. He also demonstrated the density of the irrational numbers (which is 
proved in Theorem 2.6.13 (2)). 

Nicole Oresme (1323-1382) made two contributions to the development of num- 
bers. First, though he did not invent invent analytic geometry as we use it today, in 
that he did not associate curves with equations in general, he made progress toward 
analytic geometry around 1350, when he related the study of variation with represen- 
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tation by coordinates. Second, he inquired about the meaning of raising a number to 
an irrational power, a question which is not trivial to answer, and which was resolved 
only with 19th-century rigor. 

In parallel with the spread of algebra from the Arabic world to Europe was 
the spread of the decimal place-value system. Place-value systems for representing 
numbers (though not written the way we do) were invented separately in ancient 
Mesopotamia, India, China and Mesoamerica. The Maya system used a symbol for 
zero as a place-holder, and was base 20. The system in Mesopotamia was base 60, 
though initially it did not have zero. India and China used base 10 (called “decimal’). 
The Hindu-Arabic system that we use for writing numbers today, that is, the decimal 
place-value system, has three aspects: place-value, base 10 and the Hindu-Arabic 
symbols for the numbers 0-9. The first of these three is by far the most important. 

The decimal place-value system for writing whole numbers, including the use of 
zero (sometimes denoted by a dot and sometimes by our modern symbol), originated 
in India apparently in the 7th century, and was fully developed by the 8th century. 
This system might have been inspired by Chinese counting boards which reached 
India via trade; the written system, in any case, spread from India to China and the 
Arab world. 

The first Arabic work that used the Indian system for writing whole numbers is 
Kitab al-jam wal-tafrig bi hisab al Hind of Al-Khwarizmi, which gave algorithms 
for addition, subtraction, multiplication, division and more. The Latin translation of 
this work helped spread the decimal place-value system, and the Latin rendering of 
Al-Khwarizmi’s name eventually became the word “algorithm.” 

A step forward in the development of the decimal-place value system was due 
to Abu’ Hasan Ahmad ibn Ibrahim Al-Uqlidisi (920-980), in Kitab al-fusul fi al- 
hisab al-Hindi of 952, who gave algorithms for use with pen and paper, as opposed 
to those of Al-Khwarizmi which were for dust board, and, more importantly, who 
introduced decimal fractions. This idea was further developed by Ibn Yahya al- 
Maghribi Al-Samawal (1130-1180) in 1172, who understood that some numbers 
cannot be expressed as finite decimal fractions, and can only be approximated by them. 
Al-Samawal also had a preliminary understanding of proof by induction. Ghiyath 
al-Din Jamshid Mas’ud al-Kashi (c. 1380-1429) in the early 15th century, had a 
thorough understanding of decimal fractions. 

Gerbert of Aurillac (c. 946-1003), who later became Pope Sylvester II, was one 
of the first (if not the first) person to introduce the decimal place-value system for 
writing whole numbers (though not the number zero, and not decimal fractions) to 
the West. Gerbert learned this material from Arabic teachers while residing in Spain, 
which was partly under Arabic control at the time. This system was also promoted by 
Leonardo of Pisa in his widely circulated Liber abaci. 

The first real appearance of proof by induction, in specific examples though not 
stated as a general principle, was apparently in the study of combinatorics by Levi 
ben Gerson (1288-1344) in Maasei Hoshev of 1321. 


Renaissance 


Whereas ancient Greek and medieval European mathematicians had not regarded 
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irrational ratios as numbers, the Indian and Arab mathematicians did not distinguish 
between rational and irrational numbers. By the 16th century, when Hindu-Arabic 
algebra was widely adopted in Europe, mathematicians there recognized irrational 
numbers as numbers, though irrational numbers were still considered distinct from 
other numbers, having earlier been called numerus surdus by Leonardo of Pisa (surdus 
is the Latin root for the word absurd, though it originally meant deaf). Negative 
numbers, which were not used in Europe as late as the start of the 14th century, were 
accepted in Europe by the 16th century, though with the stigma of being numeri falsi 
or numeri ficti; in subsequent centuries negative numbers were accepted as numbers 
without stigma. 

The decimal place-value system of writing whole numbers was spread in Europe 
by, among others, Robert Recorde (1510-1558) in The Grounde of Artes of 1543, and 
Adam Ries (1492-1559) in Rechenung nach der lenge, auff den Linihen vnd Feder of 
1550, which had the advantage of being printed, that method having been recently 
invented. (Children in Germany still say “nach Adam Riese,” which means “according 
to Adam Ries,” when doing arithmetic.) This system was slow to be accepted, and one 
of the people who was responsible for the popularity of this system was Simon Stevin 
(1548-1620), via his widely read and translated work De Thiende of 1585. Not only 
did Stevin promote the use of the decimal place-value system for whole numbers, as 
did his European predecessors, but he introduced the use of decimal fractions to the 
West (though he used only finite decimals, so that only certain rational numbers could 
be represented exactly); decimal fractions were previously known in the Arab world, 
but Stevin might have been unaware of that. What finally led to the widespread use of 
decimal fractions was Napier’s use of that notation in his work on logarithms. Stevin’s 
notation, which was not exactly the way we write numbers today, was brought into 
modern form in the English translation of 1616 of Napier’s work on logarithms. 

In addition to helping popularize the decimal place-value system, Stevin helped 
expand the notion of what a “number” consists of in L’Arithmetique of 1585. Though 
the ancient Greeks did not consider | to be a number, Stevin said that it was, and 
after that the idea of | as a number gained widespread acceptance. However, Stevin 
viewed 0 as the place where the natural numbers start, but not as a number itself. 
More importantly, in contrast to the earlier view that the word “number” referred only 
to fractions, but not irrational numbers, Stevin said that “every root is a number,’ and 
did not call roots of numbers by a distinct name such as “absurd” or “‘surd.” Stevin 
was not the first person to blur the ancient Greek distinction between “number” and 
“magnitude,” but he may have been the first person to have stated explicitly that there 
is no such distinction. 

Franciscus Maurolycus (1494-1575), presumably unaware of the work of Levi 
ben Gerson, used proof by induction in 1575. Blaise Pascal (1623-1662), apparently 
aware of the work of Maurolycus, used proof induction in his discussion of what we 
call Pascal’s triangle in 1665, and gave an explanation of this method of proof. 

A major step forward in the development of algebra is due to Francois Viéte 
(1540-1603). In his book In artem analyticum isagoge of 1591 he developed an 
approach to the study of equations that focused on general cases rather than specific 
examples, and he promoted the use of symbols for variables and constants, which we 
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take for granted today, but which was an innovation at the time. Viéte used vowels for 
variables and consonants for constants; today we follow Descartes and use letters at 
the end of the alphabet for variables and letters at the beginning of the alphabet for 
constants. Viéte’s emphasis on the use of symbols and writing general formulas was 
eventually incorporated in the development of calculus—the strength of calculus is 
precisely that a few simple general formulas allow for the solution of many specific 
problems (some of which were solved by ad hoc methods prior to the invention of 
calculus). 


Seventeenth Century 


The invention of analytic geometry independently, and simultaneously in 1637, by 
Pierre de Fermat (1601-1665) and René Descartes (1596-1650) helped promote 
the recognition of the real numbers as numbers. The work of the former, though 
unpublished and less influential, was perhaps more modern in starting with equations 
and associating curves to them. The work of the latter, published in the appendix 
La Géométrie of the philosophical work Discours de la méthode pour bien conduire 
sa raison et chercher la vérité dans les sciences, always started with geometrically 
defined curves, and then associated equations to them. 

In Euclid’s Elements, not only is the length of a line segment not represented by a 
number, but also, whereas numbers (meaning rational numbers) can be multiplied, 
lengths of line segments cannot be multiplied (the “product” of two line segments 
is arectangle, which is a distinct type of geometric object). Descartes, by contrast, 
defined the product of the lengths a and b of two line segments by the property that 
the ratio of ab to b is the same as the ratio of a to a fixed length J (which represents 
the number 1). That is, for Descartes the lengths of line segments can be multiplied 
just as numbers are multiplied. The distinction between numbers and lengths of line 
segments had vanished for Descartes; the set of lengths of line segments corresponded 
to the set of real numbers, a correspondence that was needed to form a correspondence 
between equations and curves, which is at the heart of analytic geometry. Observe 
that Descartes’ approach to the real numbers as corresponding to the points on the 
line was not related to the decimal expansion of such numbers. 

On the one hand, by the 17th century numbers were viewed abstractly—no 
longer the number of things in a collection of objects—and all types of number 
were accepted as genuine numbers. On the other hand, the real numbers were still 
associated with geometric ideas such as lengths of line segments. This geometric 
association, combined with the perceived “continuity” of the real number line, allowed 
mathematicians to have an intuitive idea of limits of sequences of numbers. However, 
this geometric link led to a reliance on intuition that precluded the need for a more 
rigorous approach to numbers. 

Gottfried von Leibniz (1646-1716) was the first person to distinguish between 
algebraic numbers (that is, roots of polynomials with rational coefficients, for example 
V2) and transcendental numbers (that is, numbers that are not algebraic, for example 
7). In particular, Leibniz suggested that 2 might be transcendental, a fact that was 
proved only two centuries later. 
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Carl Friedrich Gauss (1777-1855) used the idea of least upper bounds informally, 
which was a step forward in the development of the real numbers, but he did not 
provide a construction of the real numbers; he had the older notion of the real numbers 
as varying continuously, and took that as the basic intuitive idea of real analysis. 

The first person to have attempted to construct the real numbers from the rational 
numbers might have been Bernard Bolzano (1781-1848), who in the first half of the 
19th century defined the real numbers in terms of sequences of rational numbers, 
though he did not have the correct details. Bolzano made use of the Least Upper 
Bound Property, which he had proved assuming that Cauchy sequences are convergent, 
though his attempted proof that Cauchy sequences are convergent was incorrect (he 
formulated his “proof” prior to his attempt to construct the real numbers, so it was 
doomed to fail). In spite of its flaws, Bolzano’s approach to the real numbers was 
very insightful for its time, though he did not publish these ideas, and they did not 
influence subsequent developments. 

The attempt by Augustin Louis Cauchy (1789-1857) to put calculus on a firm 
foundation was a major advance in the development of real analysis, but his view 
of the real numbers was within the standard understanding of his time, and was 
not as insightful as Bolzano’s approach. In the textbook Cours d’analyse a I’ Ecole 
Royal Polytechnique of 1821, Cauchy used the fact that irrational numbers are limits 
of sequences of rational numbers, but from Cauchy’s perspective that was not a 
definition of irrational numbers but rather an observation about such numbers, which 
were simply assumed to exist. Cauchy implicitly assumed that Cauchy sequences (as 
we call them) are convergent, which would be intuitively true for someone who took 
the then prevalent intuitive view of the real numbers as “continuous,” but which from 
our perspective requires proof, and such a proof requires either a construction of the 
real numbers or an axiomatization of them. 

William Rowan Hamilton (1805-1865), in an effort to clarify the meaning of 
negative and imaginary numbers in the 1830s, gave a definition of negative numbers 
using a construction similar to the construction of the integers from the natural 
numbers found in Section 1.3. He then constructed the rational numbers from the 
integers, and attempted, though unsuccessfully, to construct the real numbers from 
the rational numbers. Using the real numbers, however constructed, he provided the 
modern construction of the complex numbers from the real numbers. 

The first proof of the existence of transcendental numbers was due to Joseph 
Liouville (1809-1882) in 1844; he proved that a specific number (concocted for 
the purpose, though not otherwise interesting) was transcendental. Charles Hermite 
(1822-1901) proved that the number e was transcendental in 1873, and Ferdinand 
von Lindemann (1852-1939) proved that 7 was transcendental in 1882. 

In spite of Bolzano’s and Hamilton’s earlier work, it was only in the second half 
of the 19th century that a broader effort was made to put the real numbers on an 
arithmetic, as opposed to geometric, basis; that is, to have the real numbers be based 
only upon the rational numbers (which in turn are based upon the integers). More 
generally, prior to the 19th century numbers were viewed as an example of the notion 
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of “quantity,” which was tied to the view of numbers and other mathematical concepts 
as having meaning in the real world, or at most as abstractions of things having such 
meaning. By the end of the 19th century that view had changed, and the modern 
approach to numbers was obtained. This transition involved both the development of 
the notion of numbers per se, and also the development of set theory as the foundation 
of mathematics (and in particular as a framework for the axiomatization of the real 
numbers). 

The development of a rigorous treatment of the real numbers was the result of the 
desire to provide a firm foundation for calculus; without a good understanding of the 
real numbers, it was not possible to give complete proofs of some important results. 
For example, Bolzano’s and Cauchy’s proofs of the Intermediate Value Theorem 
implicitly used the Monotone Convergence Theorem, though that theorem had not 
been proved at the time; this theorem was also used implicitly by Cauchy and Riemann 
in proving that certain types of functions were integrable. To prove the Monotone 
Convergence Theorem it is necessary to use the fundamental properties of the real 
numbers. 

Possibly the first rigorous construction of the real numbers from the rational 
numbers was due to Richard Dedekind (1831-1916), who worked out his construction 
in lectures in 1858, to provide a foundation for a course in real analysis that did 
not rely upon geometric proofs. Dedekind did not publish these ideas until 1872, in 
Stetigkeit und irrationale Zahlen, when he saw that Heine and Cantor were about to 
publish their versions of the construction of the real numbers. Dedekind’s method, 
based upon what we now call “cuts,” harked back to Eudoxus’ approach to proportions 
as seen in Book V of Euclid’s Elements. Dedekind’s cuts were not exactly the same 
as what we call “Dedekind cuts” today; Bertrand Russell (1872-1970) used cuts the 
way we do now. 

Another early construction of the real numbers from the rational numbers was 
due to Karl Weierstrass (1815-1897), who seems to have first presented his ideas 
on the real numbers in lectures in 1863; he did not publish these ideas. Similarly to 
Dedekind, Weierstrass presented his ideas in order to provide a foundation for proofs 
in real analysis, removing all geometric reasoning from analysis, and basing it upon 
numbers alone. Weierstrass essentially said that an irrational number is by definition 
an “aggregate” of rational numbers that intuitively converge to a number. 

Charles Méray (1835-1911), first in 1869 and in more detail in 1872, defined 
convergence of sequences of rational numbers by using the Cauchy condition, and 
then essentially defined irrational numbers as Cauchy sequences that did not converge 
to rational limits; however, Méray was not entirely rigorous. Georg Cantor (1845-— 
1918), apparently independently of Méray, had the idea of using Cauchy sequences 
of rational numbers to define the real numbers. This idea was taken up in 1872 by 
Cantor’s colleague at Halle, Eduard Heine (1821-1881), who provided a rigorous 
treatment of this approach via equivalence classes. Interestingly, Heine pointed out 
that if his construction is done starting with the real numbers, no additional numbers 
are obtained. 

The rigorous constructions of the real numbers from the rational numbers did not 
engender immediate universal support. For example, Leopold Kronecker (1823-1891), 
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who opposed Cantor, believed only in numbers that could be finitely constructed 
from the natural numbers. When Lindeman proved that z is transcendental in 1882, 
Kronecker said “Of what use is your beautiful investigation of 7? Why study such 
problems when irrational numbers do not exist?” 

Even after the real numbers were constructed from the rational numbers, some 
questions about the real numbers remained. First, if the real numbers are based upon 
the rational numbers, what are the rational numbers based upon? The rational num- 
bers, which are intuitively much simpler than the real numbers, can be constructed 
straightforwardly from the integers, which can in turn be constructed straightfor- 
wardly from the natural numbers. The question then arose as to the foundation of the 
natural numbers. The first person to characterize the natural numbers by axioms was 
apparently Dedekind, in Was sind und was sollen die Zahlen? of 1888, though again 
Dedekind formulated these ideas earlier. Dedekind’s approach to the natural numbers, 
similarly to the more familiar Peano Postulates, started with the number | and with an 
injective but not surjective successor function, though instead of induction Dedekind 
used the fact that the set of natural numbers is the smallest set with the previous two 
properties (this approach is similar to the definition of the natural numbers in Sec- 
tion 2.4). Gottlob Frege (1848-1925) published his approach to the natural numbers 
in the book Die Grundlagen der Arithmetik of 1884, though it did not have full details, 
and was not widely read. Giuseppe Peano (1858-1932) published his widely used 
postulates in 1889 (these postulates are used in Section 1.2). 

A more fundamental issue concerning the real numbers was raised by Frege, 
who questioned the approach of Weierstrass, Heine, Cantor, Dedekind, et al., asking 
how they knew that there were no logical contradictions that might arise from our 
assumptions about this set of numbers. Frege wanted to resolve the matter by basing 
numbers on logic (he used the notion of cardinal numbers, which in turn was based 
upon Cantor’s idea of two sets having the same cardinality). However, Russell’s 
paradox in set theory got in the way of Frege’s program being completed, because 
that paradox showed that even set theory, as understood at the time, had logical 
contradictions. 

Rather than constructing the real numbers from the rational numbers, which 
ultimately derive from the axioms for the natural numbers, David Hilbert (1862— 
1943) took the approach of defining the real numbers by their own set of axioms in 
Grundlagen der Geometrie of 1899 (where Hilbert was primarily concerned with 
providing a set of axioms for Euclidean geometry, to make complete what was missing 
from Euclid’s Elements). Hilbert’s original approach to the real numbers had axioms 
that define an ordered field, together with the Archimedean Principle. Those axioms 
do not, in fact, characterize the real numbers (the rational numbers also satisfy these 
properties), and in Uber den Zahlbegriff of 1900, as well as in later editions of 
Grundlagen der Geometrie, Hilbert added the Axiom of Completeness, which says 
that the system cannot be made larger while maintaining all the other axioms; adding 
this axiom does lead to a characterization of the real numbers. Today we replace both 
the Archimedean Principle and the Axiom of Completeness with the Least Upper 
Bound Property, which implies both of the other properties. Hilbert did not prove 
that there was a system that satisfied his axioms, nor did he resolve Frege’s concern. 
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Nonetheless, Hilbert’s approach to the real numbers is the one used today in most 
introductory treatments of real analysis. In this text we take the approaches of Peano 
and Dedekind in Chapter 1, and the approach of Hilbert in Chapter 2. 


2 


Properties of the Real Numbers 


2.1 Introduction 


In order to prove the theorems of real analysis we need to make use of the properties 
of the real numbers, and hence a rigorous treatment of real analysis requires a rigorous 
foundation for the real numbers. As discussed in Section 1.1, we offer in this text three 
ways to enter into the study of the real numbers. The first two ways were given in 
Chapter 1, namely, Entry | in Section 1.2, which begins with an axiomatic treatment 
of the natural numbers, and Entry 2 in Section 1.4, which begins with an axiomatic 
treatment of the integers. Starting with either Entry 1 or Entry 2, the culmination of 
Chapter | is the construction of the real numbers, together with the proofs of the core 
properties of these numbers, stated in Theorem 1.7.6 and Theorem 1.7.9. 

In the present chapter we take a more direct route to the main properties of 
the real numbers. Rather than starting with the natural numbers or the integers and 
constructing the real numbers, we have Entry 3 in Section 2.2, which begins with 
an axiomatic treatment of the real numbers. What is taken as axiomatic in Entry 3 
is nothing other than what was proved in Theorem 1.7.6 and Theorem 1.7.9; and 
what was taken as axiomatic in Entry | and Entry 2 is now proved in Section 2.4, 
where it is shown that inside the real numbers sit the natural numbers, the integers 
and the rational numbers, with the expected properties. The approach taken in the 
present section, which is the one taken in most introductory real analysis texts, leads 
as quickly as possible to the core topics in real analysis, but it requires a larger set 
of hypotheses to be taken as axiomatic, and it provides less insight into the number 
systems than the approaches in Chapter 1. 

The reader who starts the study of the real numbers with Entry 3 in Section 2.2 can 
safely have skipped Chapter 1, with the exception of Definition 1.1.1 in Section 1.1, 
which the reader should now read. The reader who has read Chapter | (starting in 
either Section 1.2 or 1.4) should skip Section 2.2 and go straight to Section 2.3. 
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2.2 Entry 3: Axioms for the Real Numbers 


When we think of “all the numbers” that we normally encounter, such as 1, 4, VJ2 
and 7, and when we think of these numbers as forming the “number line,” what we 
are really thinking of is the set of real numbers. (The reader might be familiar with 
the complex numbers, which are extremely useful in various situations, but they are 
not part of the real numbers—quite the opposite, they contain the real numbers—and 
we will not make use of complex numbers in this text.) 

The set of real numbers is the mathematical universe in which the study of real 
analysis occurs. The set of rational numbers, by contrast, is not sufficient for our 
purposes, in the sense that various important theorems in real analysis, for example 
the Extreme Value Theorem and the Intermediate Value Theorem (both proved in 
Section 3.5), are not true in the context of the rational numbers. When comparing the 
set of real numbers with the set of rational numbers, the most immediate difference, 
informally, is that the real numbers contain various numbers, such as V/2, that are not 
found in the rational numbers. However, the advantage of the set of real numbers over 
the set of rational numbers is not simply that the former has some useful numbers 
missing from the latter (which, for example, allow certain polynomial equations to 
be solved), but rather it is that the real numbers taken as a whole do not have “gaps.” 
Of course, the term “gap” is not a rigorous concept, and the real numbers have so 
many useful properties that it is not at all obvious which properties to select as a 
minimal set of properties that characterize the real numbers as distinct from other 
sets of numbers, and in particular which property (or properties) of the real numbers 
captures the notion of not having “gaps.” 

Some of the needed properties of the real numbers are algebraic in nature, for 
example properties about addition and multiplication, though it turns out that such 
properties do not suffice to distinguish the real numbers from some other sets of 
numbers, for example the set of rational numbers. Hence we will need both algebraic 
axioms as well as something additional. We start with the former. 

The algebraic properties we need involve not only addition and multiplication, 
but also the relation less than. These algebraic properties are combined in the notion 
of an ordered field, as defined below. The reader who has studied abstract algebra 
has probably encountered the concept of a field, though perhaps not an ordered field, 
which is simply a field as standardly defined in abstract algebra together with some 
additional axioms about the order relation, and the interaction of this relation with 
addition and multiplication. We do not assume that the reader is already familiar with 
fields, and we will give the complete definition here; we will prove all the algebraic 
properties of fields that we need for our purposes in Section 2.3. 

In the following set of axioms, we use the notion of a binary operation and a unary 
operation on a set, as defined in Section 1.1. As is usual, we will write “xy” instead of 
“x+y,” except in cases of potential ambiguity (for example, we will write “1-1” rather 
than “11’’), or where the - makes the expression easier to read. We will write x > y to 
mean the same thing as y < x. 


Definition 2.2.1. An ordered field is a set F with elements 0,1 € F’, binary oper- 
ations + and -, a unary operation —, a relation <, and a unary operation ~! on F — {0}, 
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which satisfy the following properties. Let x,y,z € F. 


a. (x+y) +z=x+(y+z) (Associative Law for Addition). 

b. x+y=y+x (Commutative Law for Addition). 

ce x+0=x (Identity Law for Addition). 

d. x+(—x)=0  (Inverses Law for Addition). 

(xy)z=x(yz) (Associative Law for Multiplication). 

xy=yx (Commutative Law for Multiplication). 

~x-l=x (Identity Law for Multiplication). 

. Ifx 40, thenxx-!'=1  (Inverses Law for Multiplication). 

i. x(y +z) =xy+xz (Distributive Law). 

j. Precisely one of x << yorx=yorx>yholds  (Trichotomy Law). 
k. Ifx<yandy<z,thenx<z_ (Transitive Law). 

l. Ifx<ythenx+z<y+z_ (Addition Law for Order). 

m. Ifx<yandz>0,thenxz< yz (Multiplication Law for Order). 
n. 041 (Non-Triviality). A 


ga 


is 


The Non-Triviality axiom might seem, well, trivial, but it is very much needed, 
because otherwise we could have an ordered field consisting of a single number 0, 
which is not what we would want the set of real numbers to be. 

The properties of an ordered field do not alone characterize the real numbers, 
because the rational numbers are also an ordered field (as was proved in Theorem 1.5.5, 
for the reader who commenced the study of the real numbers in Chapter 1, and as can 
be deduced from Axiom 2.2.4 and Corollary 2.4.14 for the reader who commences in 
the present section). In order to distinguish the real numbers from all other ordered 
fields, we will need one additional axiom, to which we now turn. This axiom uses 
the concepts of upper bounds and least upper bounds; while we are at it, we will 
also define the related concepts of lower bounds and greatest lower bounds. To avoid 
interrupting the flow of our treatment of the real numbers, we state the following 
definition without further discussion now, but we will have a full discussion of it in 
Section 2.6. 


Definition 2.2.2. Let F be an ordered field, and let A C F be a set. 


1. The set A is bounded above if there is some M € F such that x < M for all 
x € A. The number M is called an upper bound of A. 

2. The set A is bounded below if there is some P € F such that x > P for all 
x € A. The number P is called a lower bound of A. 

3. The set A is bounded if it is bounded above and bounded below. 

4. Let M € F. The number M is a least upper bound (also called a supremum) 
of A if M is an upper bound of A, and if M < T for all upper bounds T of A. 

5. Let P € F. The number P is a greatest lower bound (also called an infimum) 
of A if P is a lower bound of A, and if P > V for all lower bounds V of A. A 


As the reader is asked to verify in Exercise 2.3.11, a subset A C R is bounded if 
and only if there is some M € R such that |x| < M for all x € A; it is always possible 
to choose M so that M > 0. The proof of this fact has to wait until Section 2.3 because 
that is where we define absolute value. 
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Whereas having upper bounds and lower bounds of subsets of an ordered field is 
not remarkable, it is the existence of least upper bounds and greatest lower bounds 
that characterizes the real numbers. As the reader has most likely encountered, and 
as we will prove rigorously in Theorem 2.6.11, the number V2 is a real number but 
not a rational number. If we look at the set of all rational numbers that are less than 
2, that set certainly has an upper bound (for example 2), but it does not have a least 
upper bound (intuitively because the only possible candidate for such a least upper 
bound would be /2, and that is not in the set of rational numbers). The existence of 
such problems is what renders the set of rational numbers unfit to be the basis for 
calculus. No such problem exists in the set of real numbers, and it is this difference 
between the real numbers and the rational numbers that is the basis for the following 
definition, and for our axiomatic characterization of the real numbers. Of course, not 
every subset of the real numbers has a least upper bound, for example a set that has 
no upper bound, but what truly characterizes the real numbers is that if a non-empty 
set has an upper bound, then it must have a least upper bound. 


Definition 2.2.3. Let F be an ordered field. The ordered field F satisfies the Least 
Upper Bound Property if every non-empty subset of F that is bounded above has a 
least upper bound. A 


We are now ready for our axiomatic characterization of the real numbers, which 
combines algebraic properties together with the Least Upper Bound Property. 


Axiom 2.2.4 (Axiom for the Real Numbers). There exists an ordered field R that 
satisfies the Least Upper Bound Property. 


Observe that the Axiom for the Real Numbers (Axiom 2.2.4) does not say that the 
real numbers are unique, though in fact that turns out to be true, as will be proved in 
Section 2.7. 

Although it may not seem very impressive upon first encounter, we will see in 
Section 2.6, and throughout this text, just how powerful the Least Upper Bound 
Property of the real numbers is. Indeed, virtually all the major theorems in this text, 
concerning such topics as continuous functions, derivatives, integrals, sequences and 
series, rely upon the Least Upper Bound Property. 


Reflections 


In the modern approach to mathematics, as seen for example by anyone who 
has taken a course in abstract algebra, it is standard to define new objects of study 
axiomatically. It should therefore not surprise the reader that one of the ways, and in 
fact the most common way, of studying real analysis is by starting with axioms for 
the real numbers. What is surprising is that the real numbers can be characterized by 
such a relatively simple set of axioms. The axioms for an ordered field are entirely 
straightforward and not unexpected, and the Least Upper Bound Property, while not 
as intuitively simple as the axioms for an ordered field, is also not complicated, and 
is not intuitively unreasonable in retrospect. And yet, though easily stated, the Least 
Upper Bound Property is sufficient to distinguish between the real numbers and all 
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other ordered fields, and is sufficient to make all of calculus work. The fact that 
mathematicians, at the very end of the lengthy period of time during which calculus 
was developed, were able to figure out precisely what makes the real numbers work as 
they do should be viewed as a remarkable solution to a very major intellectual puzzle. 

The reader might wonder whether it would be possible to replace the Least Upper 
Bound Property with some other (perhaps more familiar) axiom, and still obtain the 
real numbers. There are, in fact, a number of possible replacements for the Least 
Upper Bound Property, some of which should be familiar to the reader. As we will see 
in Theorem 3.5.4 and Theorem 8.3.17, there are a number of important theorems in 
calculus, such as the Extreme Value Theorem (Theorem 3.5.1) and the Intermediate 
Value Theorem (Theorem 3.5.2), which are equivalent to the Least Upper Bound 
Property; any of the theorems that are equivalent to the Least Upper Bound Property 
could substitute for the latter in the axioms for the real numbers. However, the Least 
Upper Bound Property is by far easier to state than any of these other results (which 
require the definitions of concepts such as continuous functions), and so it makes 
sense to adopt the Least Upper Bound Property as the axiom. Moreover, if we were to 
adopt one of these equivalent theorems as an axiom for the real numbers, we would 
immediately have had to derive the Least Upper Bound Property from that axiom 
in order to prove the other theorems of calculus, and so it is more efficient to start 
with the Least Upper Bound Property. Of course, the Greatest Lower Bound Property 
would be just as good an axiom as the Least Upper Bound Property, and it is an 
arbitrary, though by now standard, decision to take the latter rather than the former 
axiomatically. 


2.3 Algebraic Properties of the Real Numbers 


Whether you have read Section 1.7, in which the real numbers were constructed and 
their basic properties were proved, or whether you started reading from Section 2.2, 
where we assumed these basic properties as axioms for the real numbers, from now 
on we will be making use of the properties of the real numbers only, and not how the 
real numbers are constructed. In general, what counts in mathematics is how objects 
behave, not “what they are.” Ultimately, the real numbers are numbers that behave in 
a certain way. 

In this section, and in the remaining sections of this chapter, we will explore 
various aspects of the real numbers, and we will show that they all follow from 
the axiomatic properties of the real numbers. In the present section we will discuss 
various useful algebraic properties of the real numbers, by which we mean properties 
that follow strictly from those aspects of the real numbers that were proved in Theo- 
rem 1.7.6 (for those readers who have read Section 1.7), and were stated as the axiom 
for an ordered field in Definition 2.2.1 (for those readers who have read Section 2.2). 
The majority of the facts that we prove in the present section are certainly very 
familiar to the reader, and as such require no motivation, but we need to prove these 
facts nonetheless, because we will use them later on, and we want to make sure that 
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everything we prove about real analysis throughout this text is ultimately derived 
from nothing more than the axioms for the real numbers. 

In order to show exactly how important the axiomatic properties of the real 
numbers are, and to make sure that we are not using any assumptions about the real 
numbers other than those we have stated as axiomatic, we will explicitly refer to such 
properties in the present section (including exercises) whenever they are used. Such 
explicit references to the axiomatic properties of the real numbers are admittedly 
tedious, and we will not do so after the present section, other than references to the 
Least Upper Bound Property, which is always worth mentioning. 

For convenience, we need the following terminology and notation. 


Definition 2.3.1. 


1. The binary operation — on R is defined by a— b = a+ (—b) for alla,b ER. 
The binary operation + on R— {0} is defined by a+b =ab™! for all a,b € 
R — {0}; we also let 0 +s =0-s~! =0 for all s € R— {0}. The number a+b 
is also denoted ¢. 

2. Let a€ R. The square of a, denoted a’, is defined by a* = a-a. 

3. The relation < on R is defined by x < y if and only if x < y or x = y, for all 
x,yER. 

4. The number 2 € R is defined by 2=1+1. A 


Observe that if b € R then 0—b = —J, and if b £0 then ; =p, 

We now see some basic algebraic properties of the real numbers involving addition 
and multiplication. The reader who has studied some abstract algebra can safely skip 
the proof of the following lemma, having already seen many such proofs. As is usual, 
we will write “—ab” when we mean “—(ab).” 


Lemma 2.3.2. Let a,b,c € R. 


. [fab =aand a0, thenb=1. 

. Ifab =1 thenb=a™'. 

10. Ifa#0 and b 40, then (ab)! =a7'b7!. 

11. (—1):a=~—a. 

12. (—a)b = —ab = a(—b). 

13. —(-a) =a. 

14. (-1)°=1 ad Tl * = 1, 

15. Ifab=0, thena=Oorb=0_ (No Zero Divisors Law). 
16. Ifa#0 then (a!) =a. 

17. Ifa#0 then (—a)~!=—-a“!. 


1. Ifa+c=b+c thena=b_ (Cancellation Law for Addition). 

2. Ifa+b=athenb=0. 

3. Ifa+b=0 then b= —a. 

4. —(a+b) = (-a)+(-5d). 

5. —0=0. 

6. Ifac=bcandc#0, thena=b_ (Cancellation Law for Multiplication). 
7. 0-a=0=a-0. 

8 

9. 
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Proof. We will prove Parts (1), (5), (7), (9), (11) and (15), leaving the rest to the 
reader in Exercise 2.3.1. 


(1) Suppose that a+c=b+c. Then (a+c)+(-—c) = (b+c)+(-=c). By the 
Associative Law for Addition it follows that a+ (c+(—c)) =b+(c+(-—c)), and 
hence by the Inverses Law for Addition we see that a+ 0 = b +0, and therefore by 
the Identity Law for Addition we conclude that a = b. 


(5) By the Identity Law for Addition we know that 0+ 0 = 0, and it follows from 
Part (3) of this lemma that 0 = —0. 


(7) By the Identity Law for Addition we know that 0 +0 = 0, and hence a(0+ 
0) =a-0. By the Distributive Law it follows that a-0+a-0 =a-0, and using the 
Identity and Commutative Laws for Addition we deduce that a-0+a-0=0+a-0. 
It now follows from Part (1) of this lemma that a-0 = 0. The Commutative Law for 
Multiplication then implies that 0-a = 0. 


(9) Suppose that ab = 1. It cannot be the case that a = 0, because if a were 0, 
then we would have ab = 0 by Part (7) of this lemma, which would then imply that 
0 = 1, which is a contradiction to Non-Triviality. Hence a! exists, and then using 
the Identity, Associative, Commutative and Inverses Laws for Multiplication we see 
that a~! = a~!-1 =a! (ab) = (a~!a)b =1-b=b. 


(11) By the Identity Law for Multiplication, the Distributive Law, the Inverses 
Law for Addition and Part (7) of this lemma we see that a+-a-(—1) =a-1+a-(—1) = 
a{1+(—1)] =a-0=0. Part (3) of this lemma then implies that (—1)-a = —a. 


(15) Suppose that ab = 0 and that a 4 0. Hence a~! exists, and Part (7) of this 
lemma, and the Commutative, Associative, Inverses and Identity Laws for Multiplica- 
tion, imply that 0 = a~!-0 = a7! (ab) = (a~!a)b = 1-b=b. 


Observe that Lemma 2.3.2 (2) shows that 0 is unique, Part (8) of the lemma shows 
that 1 is unique, Part (3) of the lemma shows that —a is unique and Part (9) of the 
lemma shows that a~! is unique. 

We now turn to some properties of the real numbers involving the relations less 
than, and less than or equal to. 


Lemma 2.3.3. Let a,b,c,d € R. 


1. Ifa<bandb <a, thena=b. 

2. Ifa<bandb<c, thena<c. Ifa<bandb<c, thena<c.Ifa<band 
b<c, thena<c. 

3. Ifa<bthena+c<b+e. 

4. Ifa<bandc <d, thna+c <b+d;ifa<bandc <d, thena+c<b+d. 

5. a> 0 if and only if —a < 0, anda < 0 if and only if —a > 0; also a => O if and 
only if —a <0, and a < 0 if and only if —a > 0. 

6. a<bifand only ifb—a> 0 if and only if —b < —a. Also a < b if and only if 
b—a®= Oifand only if —b < —a. 

7. Ifa#0 then a’ >0. 
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& —-1<0<1. 

9 a<atl. 

10. Ifa<bandc> 0, then ac < be. 

Il. If0<a<band0<c <d, thenac < bd; if0 <a<band0<c <d, then 
ac < bd. 

12. Ifa<bandc <0, then ac > be. 

13. Ifa>Othena!>0. 

14. Ifa>Oand b> 0, thena <b if and only if b—! < a7! if and only if a < b’. 


Proof. We will prove Parts (1), (3), (5), (7), (8), (11) and (12), leaving the rest to the 
reader in Exercise 2.3.2. 


(1) Suppose that a < b and b < a. By the definition of the relation < we see that 
a<bora=b, and that b < a or b =a. First, suppose that a < b. By the Trichotomy 
Law it cannot be the case that a = b or a > b, leading to a contradiction. A similar 
contradiction occurs if we assume that b < a. The only possibility that remains is that 
a=b. 


(3) Suppose that a < b. Then a < b ora =b. There are now two cases. First, 
suppose that a < b. Then by the Addition Law for Order we know thata+c<b+c. 
Second, suppose that a = b. Then clearly a+c =b+c. Hence a+c <b+c or 
a+c=b-+c, which means thata+c<b-+c. 


(5) Suppose that a > 0. Then by the Addition Law for Order we see that a+ 
(—a) > 0+ (—a), and then using the Commutative, Identity and Inverses Laws for 
Addition, we deduce that 0 > —a. Similarly, if —a < 0, then (—a) +a <0+a, and it 
follows that 0 < a. The proofs of the other three parts are similar, and we omit the 
details. 


(7) Suppose that a 4 0. Then by the Trichotomy Law we know that a > 0 or 
a < 0. First, suppose that a > 0. Then by the Multiplication Law for Order (using 
a > 0 in the role of both inequalities in that law), we deduce that a-a > 0-a. It then 
follows from Lemma 2.3.2 (7) that a? > 0. 

Second, suppose that a < 0. By Part (5) of this lemma we know that —a > 0. 
Using the previous paragraph we deduce that (—a)* > 0. Applying Lemma 2.3.2 (12) 
twice we see that (—a)* = (—a)(—a) = —[a(—a)] = —[—a’], and by Part (13) of that 
lemma we deduce that (—a)? = a’. It follows that a? > 0. 


(8) By Non-Triviality we know that 1 4 0. By Part (7) of this lemma it follows 
that 1* > 0, and by the Identity Law for Multiplication we deduce that 1 > 0. 

From Part (5) of this lemma we now see that —1 < —0, and by Lemma 2.3.2 (5) 
we deduce that —1 < 0. 


(11) Suppose that 0 < a < band 0 <c < d. By Part (2) of this lemma we see that 
b>Oandd > 0. 

There are now two cases. First, suppose that a = 0. By Lemma 2.3.2 (7) we then 
deduce that ac = 0. Because b > 0 and d > 0, it follows from the Multiplication Law 
for Order and Lemma 2.3.2 (7) again that 0 = 0-d < bd. Hence ac < bd. 
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Second, suppose that a > 0. We then use the Multiplication Law for Order and 
the Commutative Law for Multiplication to deduce that ac < ad. Because d > 0, we 
use the Multiplication Law for Order again to deduce that ad < bd. By the Transitive 
Law it follows that ac < bd. 

The proof of the other part is similar, and we omit the details. 


(12) Suppose that a < b and that c < 0. By Part (5) of this lemma we know that 
—c > 0. By the Multiplication Law for Order it follows that a(—c) < b(—c). We then 
use Lemma 2.3.2 (12) to see that —ac < —bc. It now follows from Part (6) of this 
lemma that be < ac. 


We now have a definition and lemma concerning positive and negative real num- 
bers. 


Definition 2.3.4. Let a € R. The number a is positive if a > 0; the number a is 
negative if a < 0; and the number a is non-negative if a > 0. A 


Lemma 2.3.5. Let a,b,c,d € R. 


1. Ifa>Oandb>0, thena+b>0. Ifa>0Oandb => 0, thena+b>0.Ifa>0 
and b > 0, thena+b> 0. 

2. Ifa<Oandb <0, thena+b <0. Ifa<Oandb <0, thena+b <0. Ifa<0 
and b <0, thena+b <0. 

3. Ifa >Oand b> 0, then ab > 0. Ifa > 0 and b > 0, then ab > 0. Ifa > 0 and 
b> 0, then ab > 0. 

4. Ifa<Oandb <0, then ab > 0. Ifa <0 and b <0, then ab > 0. Ifa <0 and 
b <0, then ab > 0. 

5. Ifa<Oand b> 0, then ab <0. Ifa <0 and b > 0, then ab <0. Ifa <0 and 
b> 0, then ab < 0. Ifa <0 and b > 0, then ab <0. 


Proof. We will prove Parts (1) and (5), leaving the rest to the reader in Exercise 2.3.4. 


(1) First, suppose that a > 0 and b > 0. Then by Lemma 2.3.3 (4) and the Identity 
Law for Addition we see thata+b>0+0=0. 

Second, suppose that a > 0 and b > 0. There are now two subcases. First, suppose 
that b > 0. Then by the previous paragraph we know that a+ b > 0. Second, suppose 
that b = 0. Then by the Identity Law for Addition we see thata+b=a+0=a>0. 

Third, suppose that a > 0 and b > 0. There are now two subcases. First, suppose 
that a > 0. Then by the previous paragraph we know that a+b > 0, which implies 
that a+ b > 0. Second, suppose that a = 0. Then by the Commutative and Identity 
Laws for Addition we see thata+b=0+b=b+0=b>0. 


~ 


(5) First, suppose that a < 0 and b > 0. By the Multiplicative Law for Order and 
Lemma 2.3.2 (7) we deduce that ab < 0-b=0. 

Second, suppose that a < 0 and b > 0. There are now two subcases. First, suppose 
that b > 0. Then by the previous paragraph we know that ab < 0, which implies 
that ab < 0. Second, suppose that b = 0. Then by Lemma 2.3.2 (7) we see that 
ab =a-0 = 0, and hence ab < 0. 

The proofs of the other two parts are similar, and we omit the details. 
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Observe that Lemma 2.3.5 (1) implies that the positive numbers are closed under 
addition, and Part (3) of the lemma implies that the positive numbers are closed under 
multiplication. This use of the term “closed” is employed only informally, and only 
occasionally, in contrast to the very different, and very important, use of this same 
word in the following definition. 

We now define various types of intervals in the real numbers. The reader has most 
likely encountered the use of intervals in previous mathematics courses, for example 
precalculus and calculus, but their importance might not have been evident in those 
courses. By contrast, the various types of intervals play a fundamental role in real 
analysis. As the reader will see throughout this text, many of the important theorems 
in real analysis are stated in terms of intervals (either open or closed, depending upon 
the situation). 


Definition 2.3.6. An open bounded interval is a set of the form 
(a,b) ={xER|a<x<b}, 
where a,b € R anda < b. A closed bounded interval is a set of the form 
[a,b] = {x ER] a<x< bd}, 
where a,b € R and a < b. A half-open interval is a set of the form 
[a,b) ={xER|a<x<b} or (a bl={xeER|a<x<b}}, 
where a,b € R and a < b. An open unbounded interval is a set of the form 
(a,0) ={xER|a<x} or (-~,b)={xER|x<b} or (—-,0)=R, 
where a,b € R. A closed unbounded interval is a set of the form 


a,~) ={xER|a<x} or (-~,b)={xER|x <b}, 


where a,b ER. 

An open interval is either an open bounded interval or an open unbounded 
interval. A closed interval is either a closed bounded interval or a closed unbounded 
interval. A right unbounded interval is any interval of the form (a,°9), [a,ee) or 
(—co, cc). A left unbounded interval is any interval of the form (—-°,b), (—e2, b] or 
(—co, co). A non-degenerate interval is any interval of the form (a,b), (a,b], {a,b) 
or [a,b] where a < b, or any unbounded interval. The number a in intervals of the 
form |a,b), [a,b] or [a,°e) is called the left endpoint of the interval. The number b 
in intervals of the form (a,b], [a,b] or (—2°,b] is called the right endpoint of the 
interval. An endpoint of an interval is either a left endpoint or a right endpoint. The 
interior of an interval is everything in the interval other than its endpoints (if it has 
any). A 


We note that there are no intervals that are “closed” at oo or —co (for example, 
there is no interval of the form [a,°>]), because oo is not a real number, and therefore 
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it cannot be included in an interval contained in the real numbers. The symbol “co” is 
simply a shorthand way of saying that an interval “goes on forever.” 

In the following lemma, and in many places throughout this text (and in virtually 
all real analysis texts), we will make use of the following convenient, but not entirely 
proper, phraseology. The symbols € and 6 (and other Greek letters) are often used to 
denote positive real numbers, which we often think of intuitively as being very small, 
though in principle they could be any size. Because this situation is so common, rather 
than quantifying such € and 6 properly by saying “for all ¢ € R such that ¢ > 0” or 
“there is some 6 € R such that 6 > 0,” we will simply say “for all € > 0” or “there 
is some 6 > 0.” In other words, it will always be assumed that “e” and “6” in such 
situations are real numbers. 

The following lemma about intervals is simple but very useful. The second part of 
the lemma is essentially what characterizes open intervals. The reader will learn more 
about “openness” in an introductory course in point set topology; see [Mun00]. 


Lemma 2.3.7. Let I C R be an interval. 


1. Ifx,y © Land x <y, then [x,y] CI. 
2. If I is an open interval, and if x € I, then there is some 6 > 0 such that 
[x —6,x+6] CL. 


Proof. Left to the reader in Exercise 2.3.6. 


A very useful concept in real analysis is the absolute value of real numbers, which 
is defined as follows. 


Definition 2.3.8. Let a € R. The absolute value of a, denoted |a|, is defined by 


la a, ifa>0 
da| = 
—a, ifa<0. A 


The following lemma states the basic properties of absolute value. 
Lemma 2.3.9. Let a,b € R. 


> 0, and |a| = 0 if and only ifa=0. 

a|<a<|al. 

= |b| if and only ifa=b ora=-—b. 

< bif and only if —b <a <b, and |a| < b if and only if —b <a<b. 
b|= lal - |b 

+b| < |a|+|b| (Triangle Inequality). 

— |b] < |a-+8| and lal — [6] < |a—B} 
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Proof. We will prove Parts (2), (4), (5) and (6), leaving the rest to the reader in 
Exercise 2.3.7. 


(2) There are two cases. First, suppose that a > 0. Then |a| =a, and hence |a| > 0. 
By Lemma 2.3.3 (5) we see that —|a| < 0. It then follows from Lemma 2.3.3 (2) 
that —|a| <a < |a|. Second, suppose that a < 0. Then —a = Ja], and hence by 
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Lemma 2.3.2 (13) we see that —|a| = a. By Part (1) of this lemma we know that 
\a| > 0, and Lemma 2.3.3 (2) then implies that —|a| <a < Jal. 


(4) Suppose that |a| < b. By Part (1) of this lemma we know that |a| > 0, and 
hence by Lemma 2.3.3 (2) we deduce that b > 0. It follows from Lemma 2.3.3 (5) 
that —b < 0. There are now two cases. First, suppose that a > 0. Then |a| = a, and 
hence |a| < b implies that a < b. We saw that —b < 0, and hence by Lemma 2.3.3 (2) 
we know that —b < a. Therefore —b < a < b. Second, suppose that a < 0. Because 
b > 0, it follows from the Transitive Law that a < b. Because |a| = —a and |a| <b, 
we see that —a < b. We now use Lemma 2.3.3 (6) and Lemma 2.3.2 (13) to see that 
—b <a. Again, we deduce that —b <a<b. 

Now suppose that —b < a < b. By Lemma 2.3.3 (6) and Lemma 2.3.2 (13) it 
follows that —b < —a < b. Because |a| equals either a or —a, we therefore see in 
either case that |a| < b. 

The fact that |a| < b if and only if —b < a < b follows from what we have already 
seen together with Part (3) of this lemma; we omit the details. 


(5) There are four cases. First, suppose that a > 0 and b > 0. By Lemma 2.3.5 (3) 
it follows that ab > 0. Therefore |a| = a, and |b| = b, and |ab| = ab. Hence |ab| = 
ab = |\a|- |b}. 

Second, suppose that a > 0 and b < 0. Then a > 0 or a= 0. There are now 
two subcases. First, suppose that a > 0. Then by Lemma 2.3.5 (5) we know that 
ab <0. Therefore |a| =a, and |b] = —b, and |ab| = —ab. Hence by Lemma 2.3.2 (12) 
we deduce that |ab| = —ab = a(—b) = |a|-|b|. Second, suppose that a = 0. Then 
by Lemma 2.3.2 (7) we see that ab = 0. We therefore see that ja] = a = 0, and 
|b] = —b, and |ab| = ab = 0. Hence, using Lemma 2.3.2 (7) again, we see that 
|ab| =0=0-b=|a|-|D]. 

Third, suppose that a < 0 and b > 0. This case is just like the previous case, and 
we omit the details. 

Fourth, suppose that a < 0 and b < 0. Then by Lemma 2.3.5 (4) we know that 
ab > 0. We therefore see that |a] = —a, and |b| = —d, and |ab| = ab. Hence, using 
Lemma 2.3.2 (12) (13) we see that |ab| = ab = —(—ab) = —[a(—b)] = (—a)(—b) = 
la|- |b). 


(6) By Part (2) of this lemma we know that —|a| <a < |a| and —|b| < b < |b|. 
Using Lemma 2.3.3 (4) we deduce that (—|a|) + (—|b]) < a+b < |a| + |b]. By 
Lemma 2.3.2 (4) we see that —(|a|+ |b]) < a+b < |a|+ |b], and it now follows 
from Part (4) of this lemma that |a+ b| < Ja|+ |b]. 


It is hard to overstate the importance for real analysis of the Triangle Inequality 
(Lemma 2.3.9 (6)) and its variants in Lemma 2.3.9 (7). We will use these inequalities 
repeatedly in proofs throughout this text. Also, the Triangle Inequality can be extended 
to the sum of more than two numbers, as stated in Exercise 2.5.3, and we will use that 
version repeatedly as well. 

The results that we have proved up till now in the present section should be very 
familiar (that is, the statements of the results should be familiar, not necessarily the 
proofs). The following result, by contrast, though quite intuitively reasonable, might 
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be one that the reader has not previously encountered. It happens on occasion in real 
analysis that we will want to prove that a real number a is equal to zero, but rather 
than doing so directly, which is sometimes difficult, we do so indirectly by proving 
that |a| is smaller than any positive real number. The following lemma shows that this 
type of proof is valid. 


Lemma 2.3.10. Leta €R. 


1. a<Oifand only ifa<é forall e>0. 
2. a> Oifand only ifa > —€ forall e>0. 
3. a=Oifand only if \a| < € forall € > 0. 


Proof. We will prove Parts (1) and (3), leaving the remaining part to the reader in 
Exercise 2.3.12. 


(1) Suppose that a < 0. Let € > 0. Then a < € by Lemma 2.3.3 (2). 

Now suppose that a < 6 for all 6 > 0. We use proof by contradiction. Suppose 
that a > 0. Then a < a, which is a contradiction to the Trichotomy Law, because 
a=a. Hence a > 0 is false, and by the Trichotomy Law it follows that a < 0. 


(3) Suppose that a = 0. Then |a| = a = 0. Therefore |a| < 0. By Part (1) of this 
lemma we deduce that |a| < € for all € > 0. 

Now suppose that |a| < 6 for all 6 > 0. It follows from Part (1) of this lemma 
that |a| < 0. On the other hand, we know by the first part of Lemma 2.3.9 (1) that 
|a| > 0. By Lemma 2.3.3 (1) we deduce that |a] = 0. We now use the second part of 
Lemma 2.3.9 (1) to conclude that a = 0. 


Not only is Lemma 2.3.10 technically useful, but it also has a very important 
philosophical consequence. In the early development of calculus, before the modern 
concept of a limit was developed, the notion of “infinitesimal” was used, where 
an infinitesimal is some sort of number that is “infinitely small,’ which means a 
non-zero number that has absolute value smaller than every real number (technically, 
the number zero is also considered to be an infinitesimal, though it is only non-zero 
infinitesimals that are useful). Although infinitesimals were conceptually important in 
the early development of calculus, Lemma 2.3.10 implies that there are no non-zero 
infinitesimals among the real numbers, and as such standard modern treatments of 
real analysis (such as ours) make no use of infinitesimals. (Infinitesimals can in fact be 
developed rigorously, but not as numbers that are included in the set of real numbers; 
rather, they are to be considered as part of a larger set that contains the real numbers, 
infinitesimals and infinitely large numbers. See [Gol98] for a rigorous treatment of 
infinitesimals, and see [Kei] or [HK03] for an elementary treatment of calculus using 
infinitesimals instead of limits.) 

Now that we have seen proofs of the basic algebraic properties of the real numbers, 
we will usually use these properties without reference, to avoid cluttering up difficult 
proofs with very simple and familiar facts. 
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Reflections 


The various properties of the real numbers proved in this section are intuitively 
straightforward, and most of them should be familiar to the reader from previous 
courses; these properties also hold for any other ordered field, such as the rational 
numbers. However, some of the topics in this section receive more attention in real 
analysis than in previous courses, for example the use of intervals. Open intervals and 
closed intervals are used in precalculus and calculus courses, but their importance, 
and especially the substantial difference between these two types of intervals, is 
not stressed there. In real analysis, by contrast, intervals have a very important role 
to play. For example, the Extreme Value Theorem (Theorem 3.5.1), which is an 
important result about continuous functions, very much depends upon the fact that the 
domain is a closed bounded interval, and not some other type of interval. A complete 
understanding of the difference between open intervals and closed intervals awaits 
the reader in an introductory topology course, where these two types of intervals are 
seen as special cases of the more general concepts of open sets and closed sets, and 
where open sets in particular have a starring role. 


Exercises 


Exercise 2.3.1. [Used in Lemma 2.3.2.] Prove Lemma 2.3.2 (2) (3) (4) (6) (8) (10) (12) 
(13) (14) (16) (17). 


Exercise 2.3.2. [Used in Lemma 2.3.3.] Prove Lemma 2.3.3 (2) (4) (6) (9) (10) (13) 
(14). 


Exercise 2.3.3. [Used in Example 4.5.3 and Example 4.6.1.] For any a © R, let a 
denote a-a-a. 
Let x,y ER. 


(1) Prove that if x < y, then x? < y’. 
(2) Prove that there are c,d € R such that c? <x < d?. 


Exercise 2.3.4. [Used in Lemma 2.3.5.] Prove Lemma 2.3.5 (2) (3) (4). 
Exercise 2.3.5. [Used in Exercise 2.3.6 and Exercise 2.8.9.] 


(1) Prove that 1 < 2. 
(2) Prove that 0 < 5 <1. 
(3) Prove that if a,b € R anda <b, thena < 48 <b. 


Exercise 2.3.6. [Used in Lemma 2.3.7.] Prove Lemma 2.3.7. [Use Exercise 2.3.5 (3).] 
Exercise 2.3.7. [Used in Lemma 2.3.9.] Prove Lemma 2.3.9 (1) (3) (7). 


Exercise 2.3.8. [Used throughout.] Let J C R be an open interval, let c € J and let 6 > 
0. Prove that there is some x € J— {c} such that |x—c| <6. [Use Exercise 2.3.5 (3).] 


Exercise 2.3.9. [Used in Theorem 10.4.4 and Exercise 10.4.4.] Let a € R, let RE 
(0, cc) and let x € (a—R,a+R). Prove that there is some P € (0,R) such that x € 
(a—P,a+P). 
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Exercise 2.3.10. [Used in Theorem 8.4.8.] Let J C R be a non-degenerate open in- 
terval, and let c € R. Prove that there is a non-degenerate open bounded interval 
(a,b) CI such that c ¢ [a,b]. 


Exercise 2.3.11. [Used throughout.] Let A C R be a set. Prove that A is bounded if 
and only if there is some M € R such that M > 0 and that |x| < M for all x € A. 
Exercise 2.3.12. [Used in Lemma 2.3.10.] Prove Lemma 2.3.10 (2). 


Exercise 2.3.13. [Used in Exercise 2.5.15.] Let a,b,x,y € R. Suppose thata<x<b 
anda <y <b. Prove that |x—y| < b—a. 


Exercise 2.3.14. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
x,y € [a,b]. Letx’ =a+b—x. 


(1) Prove that if x € [a, 442] then x’ € [4,5], and if x € [4",5] then xv € 
[a, 45°]. 
(2) Prove that if x € [a, 42] and y € [“4°,b], then |x’ —y| < |x—y). 


2.4 Finding the Natural Numbers, the Integers and the Rational 
Numbers in the Real Numbers 


Inside the set of real numbers sit three important, and very familiar, sets of numbers: 
the natural numbers, the integers and the rational numbers. The reader who has read 
Chapter | (starting in either Section 1.2 or 1.4) has already seen a rigorous treatment 
of these three sets of numbers, and should skip the present section and proceed straight 
to Section 2.5. 

On the other hand, the reader who commenced the study of the real numbers 
with the axioms for the real numbers given in Section 2.2 has not yet seen a rigorous 
treatment of the natural numbers, the integers and the rational numbers. The goal of 
the present section is to show that using only the axioms for the real numbers these 
three sets of numbers can be found relatively easily inside the set of real numbers, 
and that these sets of numbers behave just as one would expect. In fact, as was the 
case in Section 2.3, in order to find the natural numbers, the integers and the rational 
numbers in the real numbers, we will not need all the properties of the real numbers, 
but rather only the axiom for an ordered field given in Definition 2.2.1. 

We start with the set of natural numbers. What distinguishes this set of numbers 
from other sets of numbers? One answer might be that the natural numbers are 
“discrete,” in that there is a minimum distance (which is 1) between any two natural 
numbers. However, the integers are also “discrete” in this same sense, so that property 
alone does not characterize the natural numbers. One way of viewing the difference 
between the set of natural numbers and the set of integers is that the former does 
not contain negative numbers, whereas the latter does. Another intuitive way of 
differentiating between the natural numbers and the integers is that the former goes 
to infinity “in one direction” whereas the latter goes to infinity “in two directions.” 
A more formal way of viewing this distinction between the natural numbers and the 
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integers is the ability to do proof by induction in the natural numbers, but not in the 
integers. (We assume that the reader is familiar, at least informally, with proof by 
induction, which we will treat in more detail in Section 2.5.) The problem is how 
to phrase this ability to do proof by induction using only the algebraic properties of 
the real numbers that we have seen so far. The idea that resolves this problem is that 
because the natural numbers are tied up with the idea of doing proof by induction, 
then the natural numbers must certainly contain the number 1, and if a real number n 
is a natural number, then surely n + | is also a natural number. 


Definition 2.4.1. Let S C R be a set. The set S is inductive if it satisfies the following 
two properties. 


(a) LES. 
(b) Ifae S,thena+1€S. A 


There are many inductive subsets in IR, for example the set R is an inductive 
subset of itself. On the other hand, the set {1} is not an inductive set. As seen in the 
following definition and lemma, the set of natural numbers is the smallest inductive 
subset of R. 


Definition 2.4.2. The set of natural numbers, denoted N, is the intersection of all 
inductive subsets of R. A 


Lemma 2.4.3. 


1. N is inductive. 
2. IfA C Rand A is inductive, then N CA. 
3. IfneEN thenn> 1. 


Proof. 


(1) Let 4 be the collection of all inductive subsets of R. Then N = f\yeqX. 
Because X is inductive for all X € 4, we know that 1 € X for all X € 4. Hence 
1 €(\yeqX. Next, let b € (\yeqX. Then b € X for all X € A. Therefore b+1 € X for 
all X € A, and it follows that b+1 € (\yeqX. We deduce that (\y< 4X is inductive. 


(2) This fact follows immediately from the definition of N as the intersection of 
all inductive subsets of R. 


(3) First, we verify that the interval [1,c¢) is an inductive set. Clearly 1 € [1,°°). 
Let x € [1,0¢). Then x > 1. We know that x <x+ 1, and therefore x+ 1 > 1. It follows 
that x+ 1 € [1,°°), and therefore [1,°°) is inductive. We now use Part (2) of this lemma 
to deduce that N C [1,°¢). It follows immediately that ifn € N thenn > 1. 


Our next task is to show that the Peano Postulates for the natural numbers can 
be recovered from Definition 2.4.2. For the reader who is familiar with the Peano 
Postulates (for example, if the reader has read Chapter 1, where these postulates 
were given as Axiom 1.2.1 in Section 1.2, and as Theorem 1.4.8 in Section 1.4), 
we now see that the set of natural numbers that we have just located inside the real 
numbers behaves in the exact same way as the set of natural numbers that the reader 
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has previously encountered. For the reader who has not previously encountered the 
Peano Postulates, it is worth seeing them now, not only because these postulates can 
be used as axioms for the natural numbers, and they therefore shed some light on 
what makes the natural numbers what they are, but in particular because Part (c) of 
the Peano Postulates is the formal statement that proof by induction works for the 
natural numbers. Hence, the ability to do proof by induction ultimately follows from 
the axioms for the real numbers. 


Theorem 2.4.4 (Peano Postulates). Let s: N — N be defined by s(n) =n +1 for all 
neN. 


a. There isnon € N such that s(n) = 1. 

b. The function s is injective. 

c. LetG CN bea set. Suppose that 1 € G, and that if g € G then s(g) € G. Then 
G=N. 


Proof. 


(a) Suppose to the contrary that there is some x € N such that s(x) = 1. Then 
x-+ 1 = 1, which implies that x = 0. However, we know by Lemma 2.4.3 (3) that 
x > 1, which is a contradiction to the fact that 0 < 1. 


(b) Suppose that s(n) = s(m) for some n,m € N. Thenn+ 1 =m + 1, and hence 
n=m. Therefore s is injective. 


(c) By hypothesis, we are assuming that G is inductive. Lemma 2.4.3 (2) then 
implies that N C G. Because G CN, we deduce that G = N. 


The following lemma, which is a nice application of proof by induction, shows 
that the natural numbers are closed under addition and multiplication; this proof is 
written in the style of Part (c) of the Peano Postulates (Theorem 2.4.4), rather than in 
the more familiar style usually used for proofs by induction, in order to emphasize 
that we are using nothing other than what we have proved so far. 


Lemma 2.4.5. Let a,b © N. Thena+b€WNandabeN. 


Proof. Let 
G={xeN|x+yE€N forall y € N}. 


Clearly GCN. We will prove that G = N using Part (c) of the Peano Postulates 
(Theorem 2.4.4), and it will then follow that x+y € N for all x,y EN. 

We first show that 1 € G. Let d € N. Then d+ 1 €N, because N is inductive 
by Lemma 2.4.3 (1). Hence 1+d €N. It follows that 1 € G. Now suppose that 
e € G. We need to show that e+ 1 € G. Let c EN. By the definition of G, we 
know that e +c € N. Because N is inductive, it follows that (e +c) +1 € N. Hence 
(e+1)+c=(e+c)+1€N. We deduce that e+ 1 € G. Hence G satisfies the 
hypotheses of Part (c) of the Peano Postulates, and it follows that G = N. 

Next, let 

H = {x E€N|xy EN forally € N}. 
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Clearly H CN. We will show that H = N, which will then imply that xy € N for all 
xyEN. 

Let u € N. Then 1-u =u €N. Hence 1 € H. Now let v € H. We will show that 
v+1€H. Let g €N. By the definition of H, we know that vg € N. Using what we have 
already seen in this proof about addition in N, we deduce that (v+ 1)g=vg+g EN. 
It follows that v+ 1 € H. Hence H satisfies the hypotheses of Part (c) of the Peano 
Postulates, and we conclude that H = N. 


We now prove a very important fact about the natural numbers known as the 
Well-Ordering Principle, which is essentially an alternative view of what characterizes 
the natural numbers. The Well-Ordering Principle (which was also stated as Theo- 
rem 1.2.10, and was taken axiomatically as part of Axiom 1.4.4) captures the intuitive 
notion that the natural numbers are “discrete” and go to infinity “in one direction” 
by asserting that every non-empty subset of N has a smallest element (though not 
necessarily a largest one). 


Theorem 2.4.6 (Well-Ordering Principle). Let GC N be a non-empty set. Then 
there is some m € G such that m < g for all g € G. 


Proof. Let 
B={aeN|a<g forall g EG}. 


By Lemma 2.4.3 (3) we know that 1 € B, and hence BF 0. 

As a first step we show that B is not inductive. Suppose to the contrary that 
B is inductive. We know by definition that B C N, and it then follows from Theo- 
rem 2.4.4 (c) that B = N. Hence G C B. Because G # 9, there is some p € G, and 
hence p € B. Because B is inductive it follows that p+ 1 € B. By the definition of 
B we know p+1 < g for all g € G, and hence in particular p+ 1 < p, which is a 
contradiction. We deduce that B is not inductive. 

Because | € B, the fact that B is not inductive implies that there is some m € B 
such that m+ 1 ¢ B. By the definition of B, we know that m < g for all g € G. 

We will show that m € G. Suppose to the contrary that m ¢ G. Then m < g for all 
g © G. Because m+ 1 ¢ B, there is some w € G such that m+ 1 £ w, which means 
that w <m-+1. We can then use Theorem 2.4.10 (1) to deduce that w+ 1<m+1 
(we have not yet reached that Theorem because it is stated in terms of Z, which we 
do to avoid having to prove it separately for each of N and Z, but the reader can be 
assured that we do not use the present theorem in the proof of Theorem 2.4.10, so 
there is no circular reasoning here). This last inequality implies that w < m. We now 
have a contradiction to the fact that m < g for all g € G. It follows that m € G, and 
the proof is complete. 


We now turn to the integers, which consist of the natural numbers, their negatives 
and zero. 


Definition 2.4.7. Let 


—N = {x € R|x=~—n for some n € N}. 
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The set of integers, denoted Z, is defined by 
Z=—NU{O}UN. A 


The following lemma shows the relation between the sets N and Z. 
Lemma 2.4.8. 


INCZ. 
2. a€ Nifand only ifa€ Zanda> 0. 
3. The three sets —N, {0} and N are mutually disjoint. 


Proof. 
(1) This part of the lemma follows immediately from the definition of Z. 


(2) Because of Part (1) of this lemma, it is sufficient to show that if a € Z, then 
a € Nif and only ifa>0. 

Let a € Z. First, suppose that a € N. Lemma 2.4.3 (3) then implies that a > 1 > 0. 
Second, suppose that a > 0. By the definition of Z we know that a € N ora € {0} or 
a € —N. First, suppose that a € N. Then there is nothing to prove. Second, suppose 
that a € —N. Then a = —n for some n € N. Hence —a = n, and therefore —a € N. By 
Lemma 2.4.3 (3) we deduce that —a > 1, which implies that a < —1. Hence a < 0, 
which is a contradiction to the fact that a > 0. Third, suppose that a € {0}. Then 
a = 0. Again we have a contradiction to the fact that a > 0. We conclude that a € N. 


(3) It follows from Part (2) of this lemma that a € —N if and only if a € Z and 
a <0. The Trichotomy Law then implies that the three sets —N, {0} and N are 
mutually disjoint. 


The following lemma shows that the integers are closed under addition, multipli- 
cation and negation. 


Lemma 2.4.9. Let a,b € Z. Then a+b € Z, and ab € Z, and —a € Z. 


Proof. We start by showing that a+b € Z and ab € Z. There are five cases. 

First, suppose that a= 0 or b= 0. Thena+b=a or a+b =b, and in either 
case a+b € Z. We also have ab = 0 € Z. Second, suppose that a € N and b € 
N. By Lemma 2.4.5 we see that a+b € N C Zand ab € N C Z. Third, suppose 
that a € —N and b € —N. Then a = —n and b = —m for some n,m € N. By Lem- 
ma 2.4.5 we deduce that a+ b = (—n) + (—m) = —(n+m) € —N C Z, and that 
ab = (—n)(—m) = —[—nm] = nm € N C Z. Fourth, suppose that a € N and b € 
—N. Then b = —m for some m € N. If a > m, then by Exercise 2.4.1 we see that 
a+b=a+(—m) =a—meEN. If a<™m, then using Exercise 2.4.1 again we see 
that a+ b = (—(—a)) + (—m) = —[(—a) +m] = —[m—a] € —N. If a=™m, then 
a+b=a+(—m) =a+(-—a) =0 € Z. Regardless of whether a or m is larger, we use 
Lemma 2.4.5 to see that ab = a(—m) = —am € —N C Z. Fifth, suppose that a € —N 
and b EN. This case is similar to the fourth case, and we omit the details. 

Finally, we show that —a € Z. If a= 0, then -a = -O=0€ Z. Ifa EN, then 
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—aé—N CZ. Iface —N, then a = —n for some n € N, and therefore —a = —(—n) = 
nENCZ. 


Each of the three parts of the following theorem is a version of what it means 
when we say informally that the integers are “discrete.” 


Theorem 2.4.10. Let a,b € Z. 


1. Ifa<bthena+1<b. 
2. There isnoc € Z such thata<c<a+l. 
3. If |a—b| <1 thena=b. 


Proof. 


(1) Suppose that a < b. Then b—a > 0. By Lemma 2.4.9 we know that b — 
a=b+(—a) € Z. It follows from Lemma 2.4.8 (2) that b—a € N. We now use 
Lemma 2.4.3 (3) to deduce that b—a > 1.Hencea+1<b. 


(2) Suppose that there is some c € Z such that a < c < a+1. By applying Part (1) 
of this theorem to the left-hand inequality we deduce that a+ 1 <c, which is a 
contradiction to the fact that c <a+l. 


(3) Suppose that |a—b| < 1. Then -1 <a—b< 1, andhenceb—1<a<b+l. 
We know that precisely one of a< bora=bora>b holds. If a > b, we deduce that 
b<a<b-+1, which is a contradiction to Part (2) of this theorem. If a < b, then we 
deduce that b—1 <a <b, which means b— 1 <a < (b—1) +1, which again is a 
contradiction. We conclude that a = b. 


We now turn to the rational numbers (which are also called fractions). In going 
from the integers to the rational numbers, we lose the notion of “discreteness,” but we 
gain the use of multiplicative inverses for non-zero rational numbers. 


Definition 2.4.11. The set of rational numbers, denoted Q, is defined by 
Q={xER|x= ; for some a,b € Z such that b 4 0}. 


The set of irrrational numbers is the set R—Q. A 


If x € Q, then by definition x = § for some a,b € Z such that b 4 0, though, as the 
reader is aware, the integers a and b are not unique. The reader is also aware, at least 
informally, that there exist irrational numbers—if there did not, it would have been 
rather silly of us to have defined the term—though it turns out that it is non-trivial to 
prove that irrational numbers exist, and we will have to wait until Theorem 2.6.11 to 
see that. 

The following lemma shows the relation between the sets Z and Q. 


Lemma 2.4.12. 


1Z2CQ 
2. q © Qand q > 0 if and only if q = §, for some a,b EN. 
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Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 2.4.7. 


(1) Let z € Z. By Lemma 2.4.3 (1) we know that N is inductive, and in particular 
that 1 € N. Hence 1 € Z. Because 1 4 0, we can define the fraction i, and we then 
see that z=z-l=z-1-!= 7 € Q Therefore Z C Q. 


The following lemma demonstrates that the way we learn to manipulate fractions 
in elementary school is indeed valid. 


Lemma 2.4.13. Let a,b,c,d © Z. Suppose that b 4 0 and d £0. 


1. § =0 ifand only ifa=0. 
2. 51 ayy ae 
3. § = § if and only if ad = be. 
$45 sgh 
states 

“ b d- bd’ 


7. Ifa £0, then (¢) | = 4. 


a 


Proof. We will prove Parts (1), (4) and (7), leaving the rest to the reader in Exer- 
cise 2.4.8. 


(1) If ¢ =0, then ab~' =, and therefore a = a(b~'b) = (ab~')b = 0-b=0. If 
a=0,then ?=0-5 '=0, 
(4) We compute 


2 
d 


+—=ab-!4+cd~! =ab“'dd“!4+ cd~'bb™! 


SIS 


_ ad+be 
~ bd * 


(7) Suppose that a 4 0. We then use Parts (2) and (6) of this lemma to compute 


= (ad +be)(b“'d~") = (ad +be)(bd)“! 


b_ab_, 
a ba 


a 
b 


Hence (@- = B. 


The following corollary is an immediate consequence of Lemma 2.4.13, and we 
omit the proof. 


Corollary 2.4.14. Let a,b © Q. Thena+b €Q, and ab © Q, and —a € Q, and if 
a#zOthena!€Q 


There are many other algebraic properties of N, Z and Q that one might think to 
prove, though some of these properties, such as the Commutative Law for Addition, 
are trivially true, because we are assuming these properties for addition and multipli- 
cation of real numbers, and N, Z and Q are subsets of the real numbers with the same 


82 2 Properties of the Real Numbers 


addition, multiplication, negation and multiplicative inverse. We have stated all the 
distinctive facts about N, Z and Q that we will need for later use, and so we will not 
state any additional properties. 


Reflections 


The material in this section might seem unnecessary upon first encounter. The 
real numbers were defined axiomatically in Section 2.2, and given that the natural 
numbers, the integers and the rational numbers all sit inside the real numbers, what 
more needs to be said about these three sets of numbers? The answer is that in order 
to provide rigorous proofs of various results from calculus, we need proofs of some 
of the properties of these three sets of numbers—properties that do not hold for all 
real numbers. It is not possible to provide a rigorous proof of something without 
using precise definitions of the objects under consideration, and so we need precise 
definitions of exactly which subsets of the real numbers we wish to call the natural 
numbers, the integers and the rational numbers, and that is what we see in this section. 

We have provided proofs of only those properties of the natural numbers, the 
integers and the rational numbers that are needed subsequently in this text. In partic- 
ular, the reader might have noticed the absence of any discussion of the cardinality 
of these sets of numbers. Somewhat surprisingly, although the fact that the set of 
rational numbers is countable and the set of real numbers is uncountable is very 
important in some parts of mathematics, this distinction between the cardinalities of 
the rational numbers and the real numbers is of no direct importance to us in our study 
of real analysis. We will see a proof that the set of real numbers is uncountable in 
Section 8.4, but that is simply a nice application of sequences, rather than something 
useful elsewhere in this text. The difference between the rational numbers and the 
real numbers that is of interest to us in real analysis is not the size of these sets but 
the way the elements of the sets are located in relation to each other. More precisely, 
what concerns us is the fact that the Least Upper Bound Property holds for the real 
numbers, but not for the rational numbers. An additional distinction between these 
two sets of numbers that is relevant to real analysis will be seen in Section 5.8, where 
it is shown that the rational numbers have measure zero, and the real numbers do not. 


Exercises 


Exercise 2.4.1. [Used in Lemma 2.4.9 and Exercise 2.7.2.] Let a,b € N. Prove that 
a > bif and only if a— b € N if and only if there is some d € N such that b+d =a. 


Exercise 2.4.2. [Used in Theorem 10.5.2.] Let x € R. Prove that at most one of 
(x — 5,x) and (x,x-+ 5) contains an integer. 

Exercise 2.4.3. [Used in Theorem 2.5.4 and Exercise 2.5.13.] Let n € N. Suppose that 
n# 1. Prove that there is some b € N such that b+ 1 =n. 


Exercise 2.4.4. Let a,b € Z. Prove that if ab = 1, thena = 1 andb=1, ora=~—1 
andb=-—1. 
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Exercise 2.4.5. [Used in Section 2.6 and Exercise 2.6.12.] Prove that there is non € Z 
such that n? = 2. 


Exercise 2.4.6. [Used in Theorem 2.6.13.] Let g © Qandx€ R-—Q. 


(1) Prove that g+xEeR-Q. 
(2) Prove that if g £0 then gx €E R—-Q. 


Exercise 2.4.7. [Used in Lemma 2.4.12.] Prove Lemma 2.4.12 (2). 
Exercise 2.4.8. [Used in Lemma 2.4.13.] Prove Lemma 2.4.13 (2) (3) (5) (6). 


Exercise 2.4.9. [Used in Theorem 2.7.1.] Let ¢,5 € Q. Prove that ¢ < 5 if and only 
if either ch —ad > 0 and bd > 0, or cbh—ad <0 and bd < 0. 


Exercise 2.4.10. [Used in Section 3.5.] Let a,b € Q. Suppose that a > 0. Prove that 
there is some n € N such that b < na. Use only the material in Sections 2.3 and 2.4; 
do not use the Least Upper Bound Property. 


2.5 Induction and Recursion in Practice 


The reader has already encountered the theoretical role of proof by induction via the 
Peano Postulates for the natural numbers in Section 1.2, 1.4 or 2.4. In the present sec- 
tion, by contrast, our purpose is to review the practical use of proof by induction, and 
then to discuss a very important consequence of it, which is Definition by Recursion. 
We assume that the reader has some previous experience with proof by induction, 
and hence we will review such proofs only as much as needed for later purposes, and 
without intuitive motivation. The reader might not be as familiar with Definition by 
Recursion, and hence we will give it a slightly more detailed treatment. 

Proof by induction, often called the “Principle of Mathematical Induction,” is a 
method of proving certain statements involving the natural numbers, and it is quite 
distinct from the informal concept of “inductive reasoning,” which refers to the 
process of going from specific examples to more general statements, and which is 
not restricted to mathematics. More precisely, the method of proof by induction is 
given in the following theorem, which is just a restatement of Part (c) of the Peano 
Postulates, and so we state it without proof. 


Theorem 2.5.1 (Principle of Mathematical Induction). Let G C N. Suppose that 


a. 1€G; 
b. ifn eG, thenn+1€G. 


ThenG=N. 


It is important to note that Part (b) of Theorem 2.5.1 has the form P — Q, and 
that to show that Part (b) is true, we do not show that either P or Q is true, but only 
that the conditional statement P — Q is true. In other words, we do not need to show 
directly that n € G, nor that n+ 1 € G, but only that n € G implies n+ 1 €G. 
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Although Theorem 2.5.1 involves the use of a set “G,” in practice it is customary 
to avoid mentioning the set G explicitly. Suppose that we are trying to show that 
the statement P(n) is true for all n € N. The formal way to proceed would be to let 
G={n€N | P(n) is true}, and then to verify that G = N by showing that 1 € G, and 
that n € G implies n+ 1 € G for all n € N. The less cumbersome, but equally valid, 
way of proceeding is to state that we are trying to prove by induction that P(n) is 
true for all n € N, and then to show that P(1) is true, and that if P(7) is true then so 
is P(n +1) for all € N. The latter of these two steps is often called the “inductive 
step,” and the assumption that P(n) is true in the inductive step is often called the 
“inductive hypothesis.” An equivalent version of the inductive step involves showing 
that if P(n — 1) is true then so is P(n) for all n € N such that n > 2. 

We now turn to a very standard example of proof by induction; we will use this 
formula later on in Example 5.2.3 (1). 


Proposition 2.5.2. Letn € N. Then 


1)(2n+1 
174274..-4n? = SS (2.5.1) 


Proof. We prove the result by induction on n. First, suppose that n = 1. Then 1* + 
22 4.--+n? =1, and nin Gn) = POF) Grr) = 1. Therefore Equation 2.5.1 is 
true for n = 1. Now let n € N. Suppose that Equation 2.5.1 is true for n. That is, 


suppose that 


1)(2n+1 
Prt ee en ) 


We then compute 


174 feet TP Hi 4 ee ei 
n(n+1)(2n+ 1) 


= 6 +(n+1) 
ede ym nronse 
= (ng yt Enss) 

In+ I[(n+1)+1[2(n4+1) 41] 


6 


This last expression is precisely the right-hand side of Equation 2.5.1 with n+ 1 
replacing n. Hence we have proved the inductive step. This completes the proof that 
Equation 2.5.1 is true for all n € N. 


We note that the use of “:--” in Proposition 2.5.2 is not completely rigorous, unless 
we provide a valid definition of “---” in expressions of the form “a; +a2+---+4y.” 
In general, the notation “:--” can be used rigorously only in those situations where it 


is an abbreviation for something that has been properly defined. To avoid a digression 
at this point we will skip over the definition of “a; +a2+---+a,” for now, but we 
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note that it is found in Exercise 2.5.19, using material discussed subsequently in this 
section. 

There are various alternative versions of proof by induction, each of which is 
useful in certain situations where Theorem 2.5.1 might not be directly applicable. For 
example, instead of proving that a statement P(n) is true for all n €N, it is possible 
to prove that it is true for all n € N such that n > k, for some given k € N, which we 
would do by proving that P(x) is true, and then proving that if P(n) is true then so 
is P(n+1) for all n € N such that n > k. We will not be using this variant of proof 
by induction in this text, so we will not state and prove it formally. A variant of 
Theorem 2.5.1 that we will need starts at n = 1, but has a slightly different type of 
inductive step, as seen in the following theorem. We start with some notation. 


Definition 2.5.3. Let a,b € Z. The set {a,...,b} is defined by {a,...,b}={xEZ| 
a<x<b}. A 


Theorem 2.5.4 (Principle of Mathematical Induction—Variant). Let G C N. Sup- 
pose that 


a. 1EG; 
b. ifn © Nand {1,...,n} CG, thenn+1€G. 


ThenG=N. 


Proof. Suppose that G 4 N. Let H = N—G. Because H C N and H # @, the Well- 
Ordering Principle (Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6) implies that 
there is some m € H such that m </h for all h € H. By hypothesis we know that | € G, 
and hence | ¢ H, which implies that m 4 1. Using one of Lemma 1.2.3, Exercise 1.4.6 
or Exercise 2.4.3, there is some b € N such that b+ 1 = m. Let p € {1,...,b}. Then 
p<b<b+1=m. Hence m & p, and therefore p ¢ H, and so p € G. It follows that 
{1,...,b} C G. By the hypothesis on G we deduce that b+ 1 € G, which means that 
m € G, which is a contradiction to the fact that m € H. Hence G=N. 


When using Theorem 2.5.4, the inductive step involves showing that if the desired 
statement is assumed to be true for all values in {1,...,”}, then it is true for n+ 1. 
This method contrasts with Theorem 2.5.1, where we showed that if the statement 
is assumed to be true only for n, then it is true for n+ 1. It might appear as if 
Theorem 2.5.4 is unfairly making things too easy by allowing a larger hypothesis 
in order to derive the same conclusion as Theorem 2.5.1, but Theorem 2.5.4 was 
proved rigorously, so there is no cheating here. (Although the proof of Theorem 2.5.4 
uses the Well-Ordering Principle, we note that the Well-Ordering Principle is proved 
by using proof by induction, and hence Theorem 2.5.4 is ultimately derived from 
Theorem 2.5.1.) 

We will see an example of the use of Theorem 2.5.4 in the proof of Theorem 2.8.2. 

We now turn to the issue of definition by recursion, which is related to proof by 
induction, but is not the same as it. Definition by recursion involves sequences, a 
topic we will study from the perspective of real analysis in Chapter 8. Informally, 
a sequence is an infinite list a),a2,a3,... of elements of a set. Such a method of 
writing sequences often suffices for many purposes, but it is not rigorous to define 
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something by writing “...,’ unless the “...” is provided a rigorous meaning in the 
given situation. In our particular case, we can define a sequence in a set H to bea 
function f: N — H, where we think of f(1) as the first element of the sequence, 
of f(2) as the second element of the sequence and so on; we let a, = f(n) for all 
n€N. Hence, the notation “a;,a2,a3,...” corresponds to the rigorous definition of a 
sequence in terms of a function, and so we will feel free to use this notation. 

In Chapter 8 we will prove many important facts about sequences, most of which 
usually start out with the assumption that we are given a sequence or sequences that 
have already been defined. In the present section, by contrast, we discuss a very useful 
way of defining sequences. 

There are two standard ways of defining sequences in practice. Suppose that we 
want to give a formula to define the familiar sequence 1,2,4,8,16,.... Let a, denote 
the n'" term of this sequence for all n € N. The simplest way to define this sequence is 
by giving an explicit description for a, for all n € N, which in this case is a, = 2”! 
for all n € N. However, although it is easy to find such an explicit formula for a, in the 
case of this very simple sequence, it is not always possible to find such formulas for 
more complicated sequences. Another way of describing this sequence is by stating 
that a; = 1, and that a); = 2a, for all n € N. Such a description is called a recursive 
description of the sequence. There are many sequences for which it is much easier 
to give a recursive description than an explicit one. Moreover, recursion is important 
not only in mathematics, but also in logic, and the applications of logic to computer 
science; see [Rob86], [DSW94, Chapter 3] or [End72, Section 1.2] for more about 
recursion, and see [Rob84, Section 5.1] for various applications of recursion. 

If a sequence in a set H is described by an explicit formula for each a, in terms 
of n, it can be useful to find a recursive formula, but there is no question that the 
sequence exists, because the explicit formula defines a function N — H. Suppose, by 
contrast, that a sequence in H is given only by a recursive description, but no explicit 
formula. For example, suppose that we have a sequence b;,b7,b3,... in R given 
by the recursive description b} = 5, and by4,; = 1+ 3(by)? for alln EN. Is there a 
sequence satisfying this description? It appears intuitively as if such a sequence exists 
because we can produce one element at a time, starting with b; = 5, then computing 
by = 143(b,)? = 143-57 = 76, then b3 = 143(bo)? = 143-76? = 17,329 and 
so on, proceeding “inductively.” However, a sequence in R is defined as a function 
f: N-—R, and to show that this recursive description actually produces a sequence, 
we would need to find a function f: N — R such that f(1) =5, and f(2) = 76, and 
f(3) = 17,329 and so on, and it is not at all obvious how to define such a function. 
In fact, there is such a function, and it is unique, though the justification for this fact, 
seen below, is not at all trivial. 

The method of describing sequences using Definition by Recursion (also called 
“recursive definition”) can be made completely rigorous, as seen in Theorem 2.5.5 
below, but simply saying something such as “just continue inductively” is not satis- 
factory from a rigorous point of view. Proof by induction works for something that is 
already defined; here we need to prove that our definition actually produces something. 
Unfortunately, in some texts that discuss induction, not only is no proof given of the 
validity of definition by recursion, but no mention is even made of the need for such a 
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proof. It is fine to skip a difficult proof, but mention should always be made that the 
proof is being skipped. 

The simplest form of definition by recursion works as follows. Suppose that we 
are given a non-empty set H, an element e € H and a function k: H — H. We then 
want to define a sequence a),a2,... in H such that a; = e, and that a,4, = k(a,) for 
allneN. 

Given that the formal way to define a sequence in H is as a function f: N — H, 
we can reformulate definition by recursion as follows: given a set H, an element e € H 
and a function k: H — H, is there a function f: N — H such that f(1) = e, and that 
f(n+1) =k(f(n)) for all n € N? The following theorem says that such a function 
can always be found. Not surprisingly, proof by induction is the main tool in the proof 
of this theorem. The proof of this theorem, which follows [Dea66, Section 3.5], is a 
bit trickier than might be expected. 


Theorem 2.5.5 (Definition by Recursion). Let H be a set, lete € H and letk: H — 
H be a function. Then there is a unique function f : N — H such that f(1) = e, and 
that f(n+1) =k(f(n)) for alln EN. 


Proof of Theorem 1.2.4 and Theorem 2.5.5. If you are reading this proof for Theo- 
rem 1.2.4 in Section 1.2, please skip this first paragraph. Let s: N — N be defined 
by s(n) =n+1 for all n € N. The condition that f(n +1) =k(f(n)) for alln ¢ N 
can then be rephrased as fos =ko f. Recall that the function s satisfies the Peano 
Postulates (Theorem 2.4.4). 

We have to prove both existence and uniqueness; we start with the latter. Suppose 
that there are functions g,h: N — H such that g(1) = e and hA(1) = e, and that 
gos=kogandhos=koh. Let 


V = {aEN | g(a) =h(a)}. 


We will prove that V = N; the fact that g = h will follow immediately. Clearly V CN. 
We know that 1 € V because g(1) =e and h(1) =e. Now letn € V. Then g(n) =h(n). 
By the hypotheses on g and h we see that 


8(s(2)) = (g05)(2) = (kog)(n) = k(g(n)) 
= k(h(n)) = (koh)(n) = (hos)(n) = A(s(n)). 


Hence s(n) € V. It now follows from Part (c) of the Peano Postulates that V = N, and 
the proof of uniqueness is complete. 

We now prove that a function f with the desired properties exists. A crucial 
aspect of this proof is to use the formal definition of functions in terms of sets; see 
[Blo10, Definition 4.1.1] for this definition. In particular, we think of functions N — H 
as subsets of N x H satisfying certain properties. 

Let 


C={WCNXxA | (l1,e) € W, and if (n,y) € W then (s(n), k(y)) € W. 


We note that C is non-empty, because the set N x H is in C. Let f =(wecW. Clearly 
f CNH. Because (1,e) € W for all W € C, it follows that (1,e) € f. Suppose that 
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(n,y) € f. Then (n,y) € W for all W € C, and therefore (s(n),k(y)) € W for all W. 
Hence (s(n),k(y)) € f. We deduce that f € C. Clearly f C W for all W € C. 

We now show that if (n,y) € f and (n,y) ¥ (1,e), then there is some (m,u) € f 
such that (n,y) = (s(m),k(u)). Suppose to the contrary that there is some (r,t) € f 
such that (r,t) 4 (1,e), and that there is no (m,u) € f such that (7,1) = (s(m),k(u)). 
Let f = f —{(v,t)}. Clearly f C Nx H, and (1,e) € f. Let (n,y) € f. Then (n,y) € f, 
and therefore (s(7),k(y)) € f, as seen above. We know that (7,1) 4 (s(7),k(y)), and 
therefore (s(n),k(y)) € f. It follows that f € C, which is a contradiction to the fact 
that f C W for all W € C. We deduce that if (n,y) € f and (n,y) 4 (1,e), then there 
is some (m,u) € f such that (n,y) = (s(m),k(u)). 

Next, we show that f is a function N — H. Let 


G= {a €N | there is a unique x € H such that (a,x) € f}. 


We will prove that G = N, and it will follow immediately that f is a function. Clearly 
G CN. Because f € C, we know that (1,e) € f. Now suppose that (1,p) € f for 
some p € H such that p 4 e. We know from the previous paragraph that there is some 
(m,u) € f such that (1, p) = (s(m),k(u)). Hence 1 = s(m), which is a contradiction 
to Part (a) of the Peano Postulates. Therefore e is the unique element in H such that 
(1,e) € f. Hence 1 € G. 

Let n € G. Then there is a unique y € H such that (n,y) € f. Because f € C, 
we know that (s(),k(y)) € f. Now suppose that (s(1),q) € f for some g € H. By 
Part (a) of the Peano Postulates we know that s(n) 4 1, and hence (s(n),q) # (1,e). 
Using what we proved above we see that there is some (a,b) € f such that (s(n),q) = 
(s(a),k(b)). Therefore s(n) = s(a) and q = k(b). By Part (b) of the Peano Postulates 
we know that s is injective, and hence n = a. It follows that (a,b) = (n,b), which 
means that (n,b) € f. By the uniqueness of y we deduce that b = y. Hence q = k(b) = 
k(y), and therefore k(y) is the unique element of H such that (s(n),k(y)) € f. We 
deduce that s(7) € G. It follows that G = N. 

Finally, we show that f(1) = e and fos =ko f. The first of these properties is 
equivalent to saying that (1,e) € f, which we have already seen. Let n € N. Then 
(n, f(n)) € f, and hence (s(n),k(f(n))) € f. This last statement can be rephrased as 
f(s(n)) = k(f(n)). Because this last statement holds for all n € N, we deduce that 


fos=kof. 
As an important example of Definition by Recursion (Theorem 2.5.5), we turn 


to the notion of raising a real number to an integer power. Let x € R. We defined 
x° in Definition 2.3.1 (2) simply by letting x? = x-x. For higher powers of x, we 


intuitively want to define x” to be x” = x-x----+ +x, where x is multiplied with itself 
n times, for any n € N. However, writing x-x- --- -x is not a rigorous definition. We 
will use Definition by Recursion to eliminate the ---. We note that at present we are 


considering x” only for n € Z. It is certainly possible to define x” for all r € R, as 
long as x > 0, but we do not yet have the tools to do so; we will see the definition in 
Section 7.2. 

As our first step, we define x” for all n € N using Definition by Recursion. 
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Definition 2.5.6. Let x € R. The number x” € R is defined for all n € N by letting 
x! =x, and x"t! =x-x" for allx EN. A 


In order to define x” for n € Z—N, we need the following very simple lemma. 


Lemma 2.5.7. Let x € R. Suppose that x 4 0. Then x" £0 foralln EN. 


Proof. Left to the reader in Exercise 2.5.9. 


Lemma 2.5.7 allows us to make the following definition. Observe that in this 
definition, we encounter a somewhat confusing situation where the expression y—! has 
two meanings, one being the result of raising the number y to the —1 power, and the 
other being the multiplicative inverse of y. In particular, in the formula “x~” = (x")~!,” 
the term “x~”” always denotes raising a number to a power, even when n = 1, whereas 
the term “(x”")~!” denotes the multiplicative inverse of x”. Although this ambiguity 
of notation might seem confusing at first, it is actually no problem at all, because 
substituting n = 1 in the following definition shows that the two meanings of y—! are 
actually equal to each other, even if they are conceptually different. 


Definition 2.5.8. Let x ¢ R. Suppose that x 4 0. The number x° € R is defined by 
x9 = 1. For each n € N, the number x~” is defined by x7” = (x")7!. A 


The following lemma states some very familiar properties of x”, which we can 
now prove rigorously using the above definitions. 


Lemma 2.5.9. Let x € R, and let n,m € Z. Suppose that x #0. 


1. x"x” = xm 


D) x = yi-m 


Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 2.5.11. 


(1) There are five cases. First, suppose that n > 0 and m > 0. By Lemma 2.4.8 (2) 
we know that n,m € N. This part of the proof is by induction. We will use induction 
on k to prove that for each k € N, the formula xx? = x‘*+? holds for all p € N. 

Let k = 1. Let p €N. Then by the definition of x? we see that x*x? = x!.x? = 
x-xP = xP+l — yl+P — xk+P, Hence the result is true for k = 1. 

Now let k € N. Suppose that the result is true for k. Let p € N. Then by the 
definition of x* we see that x*+!x? = (x-x*) «xP =x. (xt. xP) = x- bt? = xlkt)t1 — 
x(+))+P?_ Hence the result is true for k+ 1. It follows by induction that xx? = x*+? 
for all p € N, for all k EN. 

Second, suppose that n = 0 or m = 0. Without loss of generality, assume that n = 0. 
Then by the definition of x” we see that x"x" = x97” = 1-9” =x" = xO+m — yr, 

Third, suppose that n > 0 and m < 0. Then —m > 0, and hence by Lemma 2.4.8 (2) 
we know that n € N and —m € N. There are now three subcases. 

For the first subcase, suppose that n+ m > 0. By Lemma 2.4.9 we know that 
n+m € Z, and therefore by Lemma 2.4.8 (2) we know that n+-m € N. Let r= 
n+m. Then n = r-+(—m). Because r > 0 and —m > 0, then by the first case in 
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this proof, together with the definition of x—(-™), we see that x’x’" = x"+( 
Cte My Om) = a gt Om) = (¢ My) 4) =x. Jax =m, 

For the second subcase, suppose that n-+m = 0. Then m = —n. Using the definition 
of x" wesed that ox" axe ax)! Slax ae, 

For the third subcase, suppose that n-+m <0. Then (—n)+ (—m) = —(n+m) > 0. 
Because —m > 0 and —n < 0, we can use the first of our three subcases, together with 
the definition of x~”, to see that x" = ((x")“!)—"((a™)7!) 71 = (x) ar) 
[x ny ss a [x6 n)+( m)) | [x (atm) i ia) | 1 _ y"+m_ We have now 
completed the proof in the case where n > 0 and m < 0. 

Fourth, suppose that n < 0 and m > 0. This case is similar to the previous case, 
and we omit the details. 

Fifth, suppose that n < 0 and m < 0. Then —n > 0 and —m > 0, and we can 
proceed similarly to the third subcase of the third case of this proof; we omit the 
details. 


Because of Definition 2.5.6, we can now define polynomials. The following 
definition might seem a bit trickier than necessary, but from the perspective of real 
analysis, we are interested in viewing polynomials as functions (so that we can take 
their derivatives and integrals), rather than as expressions made up of sums of powers 
of a “variable” multiplied by constants. 


Definition 2.5.10. Let A C R be a set, and let f: A — R be a function. The function 
f is a polynomial function if there are some n € NU {0} and ao,a1,...,d, € R such 
that f(x) =ao+aix+---+a,x" for all x € A. A 


It is important to point out that, as with all functions, the name of the polynomial 
defined in Definition 2.5.10 is “f?’ not “f(x).” It would be commonly understood that 
the notation “f(c)” denotes the value of the polynomial f at the element c € A, and so 
f(c) would be an element of R, which is the codomain of f. Why should “f(x)” mean 
anything different from “f(c),” except that c is one choice of element in the domain, 
and x is another such element? Historically, following Descartes, mathematicians 
have often used letters such as x, y and z to denote “variables,” and letters such as a, b 
and c to denote “constants,” but from a rigorous standpoint there is no such distinction. 
Every element of a set is a single element of the set, that is, it is a “constant.” There is 
actually no such thing as a “variable” (though at times it is convenient to use the word 
“variable” informally). In particular, in careful mathematical writing, we will always 
use the symbol f to denote the name of the polynomial, and more generally the name 
of the function, and we will always use the symbol f(x) to denote an element of the 
codomain. A careless approach in this matter can lead to misunderstandings in some 
tricky situations. 

Although we defined polynomials as functions in Definition 2.5.10, in practice it is 
customary to define a polynomial functions simply by saying “let ag + ayx+---+ a,x" 
be a polynomial,” and not even mentioning that the polynomial is a function. From 
a rigorous point of view, this informal way of defining polynomials is problematic, 
because, as mentioned above, there is no real distinction between “constants” and 
“variables,” and so the expression “ag -+a,x+---+-a,x"” is not inherently the definition 
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of a function unless it is stated as such explicitly and properly, which would include 
the domain and the codomain. However, because no real problem arises from this 
informal way of defining polynomials, for convenience we will stick to standard 
practice and define define polynomials in this way. The domain and codomain for 
polynomials will always be assumed to be R, unless otherwise stated. 

Returning to the general issue of definition by recursion, we observe that in 
Theorem 2.5.5 each a,; was a function of a, alone, given by a formula of the form 
Gn+1 = k(ay) for all n € N. However, we sometimes need to express a, in terms of 
nas well as a,. For example, suppose that we want to define a sequence by setting 
a; = 1, and ay; =n+a, for all n € N. Such a recursive definition does appear to 
produce a unique sequence that starts 1,2,4,7,11,..., though formally this definition 
by recursion is not covered by Theorem 2.5.5. Such situations are handled by the 
following variant of Theorem 2.5.5. 


Theorem 2.5.11. Let H be a set, lete © H and lett: Hx N—H be a function. 
Then there is a unique function g: N — H such that g(1) = e, and that g(n+ 1) = 
t((g(n),n)) forallneN. 


Proof. We will prove that g exists, leaving the proof of uniqueness to the reader in 
Exercise 2.5.16. 

We can apply Theorem 2.5.5 to the set H x N, the element (e,1) € H x N and the 
function r: H x N — H x N defined by r((x,m)) = (t((x,m)),m-+ 1) for all (x,m) € 
H x N. Hence there is a unique function f: N — H x N such that f(1) = (e,1), and 
that f(n+1)=r(f(n)) for alln EN. 

Let fi: N — H and f2: N — N be the coordinate functions of f. That is, the 
functions f; and f2 are the unique functions such that f(n) = (fi(), fo(m)) for all 
neN. 

Let g = f|. Because (g(1), fo(1)) = f(1) = (e, 1), then g(1) =e. 

Let n € N. Then the equation f(n+ 1) = r(f(n)) can be rewritten in coordinates 
as 


(g(n+1), faln+1)) =r((8(n), fala) = C((8@), alm), a) + 1). 


Hence g(n+ 1) =f((g(n), fo(m))) and fo(n+ 1) = fo(n) +1. Observe that the second 
equation is satisfied if we use fy = 1, where 1x7: N — N is the identity map. Hence, 
by the uniqueness of f, it must be the case that f> = Ly. It now follows that g(n+ 1) = 


t((g(n),n)). 


Example 2.5.12. We want to define a sequence of real numbers a1 ,a2,a3,... such that 
a, = 1, and ay41 = (n+ 1)a, for alln € N. Using Theorem 2.5.11 with e = 1, and with 
t(x,m) = (m+ 1)x for all (x,m) € Rx N, we see that there is a sequence satisfying the 
given conditions. This sequence starts 1,2,6,24,120,..., and consists of the familiar 
factorial numbers. We use the symbol n! to denote a,. The reader might wonder 
whether we could have dispensed with the Definition by Recursion entirely, and just 
have explicitly defined a, =n! for all n € N, but that would be viewing the situation 
backwards. The symbol ! is informally defined by n! = n(n— 1)(n—2)---2-1, but 
this type of definition is not rigorous, because “---” is not a rigorous concept. The 
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formal way to define n! without “---” is simply to say that it is the value of a, for the 
sequence we have defined by recursion. Observe that from Definition by Recursion, 
we deduce immediately that (n + 1)! = (n+ 1)n! for alln EN. % 


We conclude this section with another application of Theorem 2.5.11, again to a 
situation where something seems intuitively obvious, but we need to use Definition 
by Recursion to provide a rigorous definition. 


Example 2.5.13. We want to show that any finite set of real numbers has a greatest 
element and a least element. We will look at greatest elements; least elements behave 
entirely analogously, and we omit the details. 

Before we turn to the specific issue of finding the greatest element of finite sets, 
we define greatest elements in general. Let A C R be a set, and let c € A. The number 
c is a greatest element of A if x < c for allx € A. 

The reader might find the term “maximal element” more familiar than “greatest 
element,’ but whereas these two terms coincide in the context of the real numbers, in 
the more general context of partially ordered sets (also called posets), the concept of a 
“maximal element” is not the same as the concept of a “greatest element,” and so we 
will make use here of the latter term, because it corresponds to the above definition in 
all contexts; see [Blo10, Section 7.4] for definitions of these terms for posets. 

Not every subset of R has a greatest element, for example the open interval (1,2). 
However, if a set has a greatest element then it is unique, as seen in Exercise 2.5.17. 
Some other properties of greatest elements can be found in Exercise 2.5.18. 

It is easy to define the maximum of two real numbers at a time. Let x, y € R. The 
maximum of x and y, denoted max{x,y}, is defined by 

x, ifx>y 


3 


max{x,y}= ; 
(x9 y, ifx<y. 


The problem is providing a rigorous definition of the maximum of arbitrarily sized 
finite sets of real numbers. More precisely, let p € N, and let aj,a,...,a) € R. 
We want to show that the set {a,...,ap} has a greatest element; if such a greatest 
element exists, then it would be unique, and it would be denoted max{aj,...,a ae 
(We note, however, that this uniqueness is as an element of the set {a),...,a ae and 
that the greatest element need not be represented by a unique number of the form 
a;; for example, if a, = 7, and ay = 0 and a3 = 7, then {a),a2,a3} = {0,7}, and 
max{a),4,a3} = 7, which is a unique number, but we observe that 7 is represented 
by both a; and a3.) 

Simply writing the notation “max{a1,...,a)}” does not alone suffice to show that 
a greatest element of the set exists, because, as we have stated previously, the notion 
of ... is not rigorous. We avoid this problem by using Definition by Recursion, where 
intuitively we find the maximum of a1,...,qa, by first finding the maximum of a; and 
ag, then finding the maximum of max{a,a2} and a3, and continuing in this fashion, 
until we reach the end. 

To avoid writing the convenient but informal notation “a),...,d),” we use func- 
tions. Let f: N — R be a function. We think of f(7) as a, for all n € N. (If we are 
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given only a finite sequence a),a2,...,a, to begin with, we can define f(i) = a; for 
alli € {1,...,p}, and f(i) = a, for all ic N—{1,...,p}.) If the reader is wondering 
why we are willing to write the “...” in {1,..., p} but not in a,...,dp, it is because 


the former was given a rigorous definition in Definition 2.5.3. 

Lett: R x N— R be defined by ¢((x,m)) = max{x, f(m-+1)} for all (x,m) € Rx 
N. By Theorem 2.5.11 there is a unique function g: N — R such that g(1) = f(1), and 
that g(n + 1) =1t((g(n),n)) for all n € N. It follows that g(n+ 1) = max{g(n), f(n+ 
1)} for all n € N. Hence g(n+1) > g(n) and g(n+1) > f(n+1) for alln EN. 

We claim that g(n) € {f(i) |ie {1,...,2}} for all n © N. We know that g(1) = 
f(1), so clearly g(1) € {f(é) |i € {1,..., 1} }. Next, suppose that g(k) € {f(i) |i€ 
{1,...,4}} for some k € N. Then g(k + 1) = max{g(k), f(k+1)}, and so g(k+1) = 
g(k) or g(k +1) = f(k+1). In either case we deduce that g(k+1) €{f(i) |ie {1,..., 
k+1}}. By induction we conclude that g(n) € {f(i) |i € {1,...,n}} for alln EN. 

Next, let m € N. We claim that g(m) > f(i) for alli € {1,...,m}. Because g(n + 
1) > g(n) for alln € {1,...,m— 1}, it follows from Exercise 2.5.4 that g(m) > g(i) 
for all i € {1,...,m}. However, because g(n) > f(n) for all n €N, it follows that 
g(m) > f(i) for all i € {1,...,m}. Hence g(m) is a greatest element of {f(i) | i € 

0 


{1,...,m}}. 


In Exercise 2.5.19 it is similarly shown that any finite set of real numbers has a 
sum; that is, that expressions of the form “a, +a2+---+ a,” are defined for arbitrary 
né&N, where aj,...,d) ER. 


Reflections 


This section has a number of lengthy proofs of apparently obvious facts. The 
reader has used x” and its properties confidently for years; has never questioned the 
definition of max{a1,...,ap}; has a lot of familiarity with the Principle of Mathemat- 
ical Induction from previous courses; and has, without any hesitation, used Definition 
by Recursion many times prior to reading this section. Moreover, induction and re- 
cursion are algebraic in nature, and the reader might wonder why they are included 
in such detail in a text on real analysis, which is about calculus rather than algebra. 
The reason the material in this section is discussed in such detail is simply that all this 
material is used subsequently in this text for the treatment of topics that belong to the 
study of real analysis, and if we want to be sure that everything proved in real analysis 
follows from the basic properties of the real numbers with no hidden assumptions, we 
need to prove everything we use, even if intuitively obvious, and even if appearing to 
belong to a different branch of mathematics. 


Exercises 


Exercise 2.5.1. [Used in Example 10.4.16.] Let n € N, and let a € R. Suppose that 
a> 1. Prove that a” > a for alln € N. 


Exercise 2.5.2. [Used in Exercise 2.5.3 and Exercise 2.5.4.] There are occasions when 
we need to do “induction” to prove something about the first p natural numbers, but 
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not beyond that. In this exercise we will see that such a procedure works. Examples 
of using this method of proof are found in Exercise 2.5.3 and Exercise 2.5.4. 
Let p € N, and let GC N. Suppose that 


a. 1EG; 
b. ifn € Gandn < p—1,thenn+1 €G. 


Prove that {1,...,p} CG. 


Exercise 2.5.3. [Used throughout.] Let n € N, and let aj,a2,...,a, € R. Prove that 
lay tao +-+++ay| < lay] +fa2|+---+]an|- [Use Exercise 2.5.2.] 


Exercise 2.5.4. [Used in Example 2.5.13.] Let n € N, and let aj,a2,...,a, € R. Sup- 
pose that a; < aj, for all i € {1,...,2—1}. Prove that a; < a, for alli € {1,..., 


n}. [Use Exercise 2.5.2.] 


Exercise 2.5.5. [Used in Example 5.9.7.] Prove that 


1 
Pee ene a a ) 


foralln EN. 


Exercise 2.5.6. [Used in Example 9.2.4.] Prove that 


foralln Ee N. 


Exercise 2.5.7. [Used in Example 9.2.4.] Prove that 


foralln EN. 


Exercise 2.5.8. [Used in Example 8.4.10.] Let n € N, and let a),a2,...,a, € R. Sup- 
pose that a; > a2 >--- > a, > 0. Prove that 0 < aj —ay+a3-—a4+.. Ae) < 
Q\. 


Exercise 2.5.9. [Used in Lemma 2.5.7.] Prove Lemma 2.5.7. 


Exercise 2.5.10. The sequence r;,r2,... is defined by the conditions r; = 1, and 
rn41 =4rn, +7 for all n € N. Prove that r, = +(10 Ari —7) foralln EN. 


Exercise 2.5.11. [Used in Lemma 2.5.9.] Prove Lemma 2.5.9 Part (2). 


Exercise 2.5.12. [Used in Theorem 2.8.10, Exercise 2.8.4, Example 9.2.4, Example 10.3.7, 
Theorem 10.3.8 and Theorem 10.5.2.] Let n € N. 


(1) Let x,y € R. Prove that 


n—2 


yt —x* = (y—x)(y" 1 +y apy peg), 
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(2) Let x,y € R. Prove that if0 <x <y, then x” < y”. 
(3) Let a,r € R. Suppose that r ¥ 1. Prove that 


1—r" 
atar+tar+---+ar!= an") : ) 

—r 
Exercise 2.5.13. [Used in Exercise 2.6.13, Lemma 2.8.1, Example 8.2.13 and Exam- 
ple 9.2.4.] 


(1) Let g € R. Suppose that g > 0. Prove that 1+-ng < (1+q)" foralln EN. 
(2) Let p € N. Suppose that p > 1. Prove that n < p” for alln EN. 
[Use Exercise 2.4.3.] 


Exercise 2.5.14. [Used in Exercise 3.4.6.] Let n € N, and let [a;,b)],..-,{an,0n] CR 


be closed bounded intervals. Prove that U'_, [a;,b;] equals the union of finitely many 
disjoint closed bounded intervals. 


Exercise 2.5.15. [Used in Example 5.8.2.] Let [x,y] C R be a non-degenerate closed 
bounded interval, let n € N and let [a1,b1],--- , [@n,bn] C R be non-degenerate closed 
bounded intervals. Prove that if [x,y] C UL, [a;,b;], then y—x < Yi | (bj — a). 

This exercise can be simplified as follows. Without loss of generality, assume that 
a, <a. < +++ <ap+1; if that is not the case, the indices can be renamed to make that 
happen. Moreover, if [a;, bi] C [a;,b,] for some i, j € {1,...,n +1}, then the interval 
(a;,b;| could be dropped without any change to the hypothesis on [x,y]. Hence, without 
loss of generality, assume that none of the intervals of the form [a;, b,] is a subset of 
another such interval. [Use Exercise 2.3.13.] 


Exercise 2.5.16. [Used in Theorem 2.5.11.] Prove the uniqueness of the function g in 
Theorem 2.5.11. 


Exercise 2.5.17. [Used in Example 2.5.13 and Lemma 8.3.8.] Let A C R be a set. Prove 
that if A has a greatest element, then that greatest element is unique. 


Exercise 2.5.18. [Used in Example 2.5.13 and Lemma 8.3.8.] Let A,B C R be sets. 
Suppose that A C B. 


(1) Prove that if A has a greatest element a and B has a greatest element b, then 
a<b. 

(2) Suppose that B —A is a finite set. Prove that if A has a greatest element then B 
has a greatest element. 

(3) Find an example of sets A and B such that B —A is a finite set and B has a 
greatest element, but that A does not have a greatest element. 


Exercise 2.5.19. [Used throughout.] The binary operations addition and multiplica- 
tion are defined for only two numbers at a time; hence the word “binary.” On the 
other hand, it is very convenient to use expressions of the form “a; +a2+---+dy” 


for arbitrary n € N, where a,...,@n € IR. However, simply writing “---” and saying 
that we add the numbers two at a time is not rigorous, and it is the purpose of this 
exercise to give this use of “---” a rigorous definition. (A similar definition can be 


given for the product of finitely many numbers, though we omit the details.) 
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The intuitive idea of this definition is as follows. Suppose we want to compute the 
finite sum 3 + 7+ 2. Because addition is defined for only two numbers at a time, there 
are two ways one might go about finding the sum 3 +7 +2, which are (3+ 7) +2 
and 3 + (7 +2). Fortunately, because of the Associative Law for Addition, these two 
sums are equal, and hence we can write 3 +7 +2 unambiguously. If we can define 
the sum of three numbers unambiguously, then a similar argument will show that the 
sum of four numbers can also be defined and so on. 

For the formal definition, rather than writing a; + a2+--:+ dp, it is easier to 
use functions. Let f: N — R be a function. Use Theorem 2.5.11 to give a rigorous 
definition for the expression “f(1) + f(2)+---+f(n)” for alln EN. 


Exercise 2.5.20. [Used in Example 8.4.10.] Let H be a non-empty set, let a,b € H and 
let p: H x H —H bea function. Prove that there is a unique function f: N — H such 
that f(1) =a, that f(2) =b and that f(n +2) = p((f(n), f(n+1))) for alln EN. 

The idea is to apply Theorem 2.5.5 to the set H x H, the element (a,b) and the 
function k: H x H — H x H defined by k((x,y)) = (y, p(x, y)) for all (x,y) € H x H, 
and then to use the result of that step to find the desired function /. 


2.6 The Least Upper Bound Property and Its Consequences 


We have now reached the essence of the real numbers. The various properties of the 
real numbers we saw up till now in this chapter were proved using only the properties 
of an ordered field (Theorem 1.7.6 or Definition 2.2.1). As such, all the properties 
of the real numbers that were proved in the previous sections of this chapter would 
hold in any ordered field, for example the set of rational numbers. We now turn to 
those properties of the real numbers that make use of the Least Upper Bound Property 
(Theorem 1.7.9 or Definition 2.2.3). The Least Upper Bound Property is the property 
of the real numbers that distinguishes these numbers from all other ordered fields, and 
in particular from the rational numbers, and it is the Least Upper Bound Property that 
ultimately allows us to do calculus, which is what real analysis is about. 

Recall from Definition 1.7.7 or Definition 2.2.2 the concepts of bounded above, 
bounded below, bounded, upper bound, lower bound, least upper bound and greatest 
lower bound. Recall also Exercise 2.3.11, which states that a subset A C R is bounded 
if and only if there is some M € R such that M > 0 and that |x| < M for all x € A. We 
now discuss these concepts further, starting with some examples. 


Example 2.6.1. 


(1) Let A = [3,5). Then 10 is an upper bound of A, and —100 is a lower bound. 
Hence A is bounded above and bounded below, and therefore A is bounded. Clearly 
15 is also an upper bound of A, and 0 is a lower bound of A. In fact, there are infinitely 
many other upper bounds and lower bounds of A. The least upper bound of A is 5, 
and the greatest lower bound of A is 3. Observe that A does not contain its least upper 
bound. On the other hand, if we let B = [3,5], then the least upper bound of B is 5, 
and B does contain its least upper bound. 
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(2) Let C = {1 |n € N}. Then C is bounded above by 2 and bounded below by 
—1. The least upper bound of C is 1, and the greatest lower bound of C is 0. 

(3) The set N is bounded below (a lower bound is 0), but it is not bounded above, 
as the reader is asked to prove in Exercise 2.6.16, using tools that we will develop 
later in this section. Hence N is not bounded. The greatest lower bound of N is 1, but 
N has no least upper bound, because it has no upper bound. ?) 


Whereas upper bounds and lower bounds of sets are not unique, as seen in 
Example 2.6.1 (1), the following lemma shows that least upper bounds and greatest 
lower bounds, when they exist, are unique. 


Lemma 2.6.2. Let A C R be a non-empty set. 


1. IfA has a least upper bound, the least upper bound is unique. 
2. If A has a greatest lower bound, the greatest lower bound is unique. 


Proof. We will prove Part (1); the other part is similar, and we omit the details. 


(1) Suppose that M,N € R are both least upper bounds of A. That means that M@ 
and N are both upper bounds of A, and that M < T and N < T for all upper bounds T 
of A. It follows that M < N and N < M, and therefore M = N. 


Because of Lemma 2.6.2 we can now refer to “the least upper bound” and “the 
greatest lower bound” of a set, if they exist. Hence the following definition makes 
sense. 


Definition 2.6.3. Let A C R be a non-empty set. If A has a least upper bound, it is 
denoted lubA. If A has a greatest lower bound, it is denoted glbA. A 


In addition to the Least Upper Bound Property of the real numbers, there is also 
a corresponding—and equivalent—property for greatest lower bounds, called the 
Greatest Lower Bound Property, which we now prove. The reader who has read 
Section 1.7 has already seen the Greatest Lower Bound Property (Theorem 1.7.8), 
and can therefore skip the following theorem. 


Theorem 2.6.4 (Greatest Lower Bound Property). Let A C R be a set. If A is 
non-empty and bounded below, then A has a greatest lower bound. 


Proof. Suppose that A is non-empty and bounded below. Let 
L={b€R| bisa lower bound of A}. 


Then L 4, because A is bounded below. Let x € A. Then b < x for all b € L, and 
hence x is an upper bound of L. Because L is non-empty and bounded above, we can 
apply the Least Upper Bound Property to L to deduce that L has a least upper bound, 
say m. 

If y € A, then d < y for all d € L. Therefore every element of A is an upper bound 
of L. Because m is the least upper bound of L, it follows that m < y for all y € A. 
Hence m is a lower bound of X. Because m is an upper bound of L, then d < m for 
every d € L. We deduce that m is a greatest lower bound of X. 
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We now have a number of useful consequences of the Least Upper Bound Property 
(and the Greatest Lower Bound Property), starting with the following lemma, which 
we will use repeatedly. Intuitively, this lemma says that if a set has a least upper bound, 
then it is possible to find elements of the set as close as one wants to the least upper 
bound. That is, even if a set does not contain its least upper bound, there is no “gap” 
between the set and the least upper bound, and similarly for greatest lower bounds. 


Lemma 2.6.5. Let A C R be a non-empty set, and let € > 0. 


1. Suppose that A has a least upper bound. Then there is some a € A such that 
lubA —€ <a <lubA. 

2. Suppose that A has a greatest lower bound. Then there is some b € A such 
that glbA <b < glbA+e. 


Proof. We will prove Part (1); the other part is similar, and we omit the details. 


(1) Because lubA — € < lubA, then lubA — € cannot be an upper bound of A, 
because lubA is the least upper bound. Hence there is some a € A such that lubA — € < 
a. Because lubA is an upper bound of A, then a < lubA. 


We observe that in Lemma 2.6.5 (1) the inequality lubA — € < a < lubA cannot 
in general be replaced with lubA — € < a < lubA, because the latter inequality would 
not hold, for example, when A has a single element; the analogous fact holds for 
Part (2) of the lemma. 

Our next result is another technically useful fact about least upper bounds and 
greatest lower bounds. This lemma generalizes the well-known Nested Interval Theo- 
rem (Theorem 8.4.7), but is in fact simpler than that result, because it does not require 
the language of sequences. We will use this lemma in the proof of Theorem 5.4.7, 
which is a very important result about integration, as well as in the proof of the Nested 
Interval Theorem. The idea of this lemma is that if the elements of one set of real 
numbers are all greater than or equal to the elements of another set of real numbers, 
and if elements of the two sets can be found as close to each other as desired, then 
there is no gap between the two sets. 


Lemma 2.6.6 (No Gap Lemma). Let A,B C R be non-empty sets. Suppose that if 
a€AandbeB, thena< b. 


1. A has a least upper bound and B has a greatest lower bound, and lubA < glbB. 
2. lubA = glbB if and only if for each € > 0, there area € A and b € B such that 
b-a<eé. 


Proof. 


(1) Because A and B are both non-empty, then A is bounded above by any element 
of B, and B is bounded below by any element of A. The Least Upper Bound Property 
and the Greatest Lower Bound Property imply that A has a least upper bound and B 
has a greatest lower bound. 

Suppose that lubA > glbB. Let pw = eo Then pt > 0. We know by Lem- 
ma 2.6.5 that there is some a € A such that lubA — uw < a < lubA, and there 
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is some b € B such that glbB < b < glbB+ w. It follows that 


lubA—glbB — lubA+glbB 
2 * 2 


b < glbB+p = glbB+ 


aed lubA — glbB 


=lubA—p <a, 


which is a contradiction. Hence lubA < glbB. 


(2) By Part (1) of this lemma we know that lubA < glbB. 

Suppose that lubA = glbB. Let € > 0. By Lemma 2.6.5 there are p € A andg € B 
such that lubA — 5 < p < lubA and glbB < q < glbB+ 5. Then lubA—5 <p< 
lubA = glbB <q < glbB+§ =lubA + §, and it follows that g— p< e€. 

Now suppose that lubA 4 glbB. Let 7 = glbB—lubA. Then n > 0. If x € A and 
y © B, then x < lubA and y > glbB, which implies y— x > glb B— lubA = 1). Hence 
it is not the case that for each € > 0, there are a€ A and b € B such that b—a< eé, 
for example if € = 7 /2. 


The reader has already encountered a number of basic results about the natural 
numbers in Section 1.2, 1.4 or 2.4. In contrast to those results, which were about 
the natural numbers in their own right, we now turn to a very important theorem, 
called the Archimedean Property of the real numbers, that illuminates how the natural 
numbers are situated inside the larger set of all real numbers. This theorem may 
seem so obvious that the reader might question the need for proving it, but in fact the 
proof is non-trivial, making use of the Least Upper Bound Property. To appreciate 
the value of the Archimedean Property, which we will use regularly, we note that 
there exist ordered fields for which the Archimedean Property does not hold; see 
[Olm62, Sections 711-713] for an example of such an ordered field. 


Theorem 2.6.7 (Archimedean Property). Let a,b € R. Suppose that a > 0. Then 
there is some n € N such that b < na. 


Proof. First, suppose that b < 0. Let n = 1. It follows that b < 0 < a=na. Second, 
suppose that b > 0. We use proof by contradiction. Suppose that na < b for alln € N. 
Let 

A= {ka|k © N}. 


Then A C R. Because 1-a € A, we see that A # 0. By hypothesis we know that b is 
an upper bound of A. The Least Upper Bound Property then implies that A has a least 
upper bound. Let m € N. Then m+ 1 € N, and hence (m+ 1)a € A, which implies 
(m+ 1)a < lubA. Therefore ma +a < lubA, and hence ma < lubA — a. Because m 
was arbitrarily chosen, we deduce that lubA — a is an upper bound of A, which is a 
contradiction to the fact that lubA is the least upper bound of A. 


Corollary 2.6.8. Let x € R. 


1. There is a unique n € Z such thatn—1 <x <n. Ifx>0, thenn EN. 
2. If x > 0, there is some m € N such that 4 <X. 
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Proof. 


(1) First we prove uniqueness. Suppose that there are n,m € Z such thatn — 1 < 
x <nandm—1 <x <m. We proceed by contradiction. Suppose that n 4 m. Without 
loss of generality, assume that n < m. By Theorem 2.4.10 (1) we know that n+ 1 < m. 
Hence n < m—1, and therefore x <n <m-—1 <x, which is a contradiction. We 
deduce that n = m. 

We now prove existence. First, suppose that x = 0. Then we can let n = 1. 

Second, suppose that x > 0. We can then apply the Archimedean Property (Theo- 
rem 2.6.7) to 1 and x to deduce that there is some k € N such that x < k- 1, which 
means that x < k. Let 

B= {meN|x<m}. 


Clearly B CN. We have just seen that k € B, and hence B F @. By the Well-Ordering 
Principle (Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6) there is some n € B such 
that n < m for all m € B. The definition of B implies that x < n. By the minimality 
of n we know that n—1 ¢ B. Ifn—1€N, thenn—1 ¢ B implies that n—1 < x. If 
n—1€N, then n = 1, and son—1=0, and hence by hypothesis on x we see that 
n—1 <x. Combining the two cases, we dedcue thatn—1<x<n. 

Third, suppose that x < 0. Then —x > 0. As we have just seen, there is some 
m € N such that m— 1 < —x < m. Then —m < x < —m-+1. There are now two 
subcases. First, suppose that x = —m-+ 1. Then let n = —m-+ 2. Therefore x =n — 1, 
and hence n— 1 <x <n. Because m EN, it follows that n € Z. Second, suppose that 
—m <x <—m-+1. Then let n = —m-+1, and hence n— 1 <x <n. Because m € N, 
then once again n € Z. 


(2) Suppose that x > 0. Then : > 0. We know by Part (1) of this corollary that 
there is some m € N such that m— 1 < i < m, and it follows that A Xe: 


We are now ready to fulfill a promise made in Section 2.4, where we said that 
we would show that there exist irrational numbers, a fact that is not obvious from 
the definition, because the set of irrational numbers is simply defined to be R—Q, 
and we have not yet proved that Q is not all of R. In particular, we will prove that 
J2 exists in R, and after that we will show that J2 is not in Q. The reader might 
be acquainted with the standard proof by contradiction that V2 is irrational using 
fractions in “lowest terms” and even and odd numbers; see [Blol10, Theorem 2.3.4] 
for this proof. We will use a different approach here, however, for two reasons. First, 
we have not given a rigorous treatment of fractions in “lowest terms,” nor of even 
and odd numbers (not because these topics are too difficult, but because we do not 
need them for any other purpose), and hence we do not have these tools available to 
us now. Second, and more importantly, the standard proof that V2 is irrational shows 
only that there is no rational number x such that x? = 2, but it does not show that there 
exists an irrational number x such that x2 = 2, which is what we need in order to show 
that R — Q is not empty. The reader might wonder whether something as powerful as 
the Least Upper Bound Property is really needed to prove something as apparently 
simple as showing that the equation x* — 2 = 0 has a solution in R, but once we leave 
the realm of the rational numbers, we should not be surprised that we need to make 
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use of the one tool we have available for the real numbers that is not available for the 
rational numbers, namely, the Least Upper Bound Property. 

The following theorem states that any positive real number has a square root; 
proving that fact takes no more effort than showing that just the number 2 has a square 
root. 


Theorem 2.6.9. Let p € (0,-). Then there is a unique x € (0,00) such that x? = p. 


Proof. We follow [Sto01]. We will prove existence in the case that p > 1. The rest of 
the existence proof, and the uniqueness proof, are left to the reader in Exercise 2.6.18. 
Suppose that p > 1. Then p? > p. Let 


S={weR|w>Oand w’ < p}. 


Because 1? < p, then 1 € S, and therefore S 4 0. Let y € R. Suppose that y > p. Then 
y > 0, and hence y* > p? > p. It follows that y ¢ S. Therefore, if z € S, then z < p. 
We deduce that p is an upper bound of S. Therefore S is bounded above. By the Least 
Upper Bound Property, the set S has a least upper bound. Let x = lubS. Because | € S, 
thenx >1>0. 


Let ; 
ee ae aa 
pr+x 
The reader can verify that 
1 —1)(x? — 
(Pet) ong P2532 P(p Jor P) 
p+x (p+x) 


It follows from the first equality that t > 0. 
We now show that x? = p. Suppose to the contrary that x”  p. First, suppose 
that x7 > p. Then t <x and t? > p. Let u € S. Then wu? < p, and hence wu? < ??. By 
Lemma 2.3.3 (14) we deduce that u < t. Therefore ¢ is an upper bound of S, which is 
a contradiction to the fact that t < x and x is the least upper bound of S. 
Second, suppose that x” < p. Then t > x and t? < p. Therefore t € S, which is a 
contradiction to the fact that t > x and x is an upper bound of S. 
We conclude that x” = p. 


Definition 2.6.10. Let p < (0,00). The square root of p, denoted ,/p, is the unique 
x € (0,0¢) such that x? = p. A 


There is a more pleasant, and less ad hoc, proof that every positive real number 
has ann" root for every n € N in Exercise 3.5.6, though this nicer proof requires the 
Intermediate Value Theorem, which we have not yet proved. Even more generally, 
in Section 7.2 we will define x” for all x € (0,0) and all r € R, which includes the 
case of the n'® root of x by using r = i; once again, this more general definition 
requires tools we have not yet developed, for example integration, which is used to 
define logarithms, which are needed to define x’. (It should be noted that both of 


these alternative methods of constructing roots also ultimately rely upon the Least 
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Upper Bound Property.) However, even though these more slick, and more general, 
definitions of the square root of positive real numbers exist, we have given the above 
proof in order to have square roots available to use now, because we will use them on 
occasion. 

We now show that ,/p is not rational for any natural number p that is not a perfect 
square. In particular, it will follows from Exercise 2.4.5 that /2 is not rational. The 
following proof is due to Richard Dedekind (1831-1916), who invented what are now 
called Dedekind cuts (which were discussed in Section 1.6). 


Theorem 2.6.11. Let p ¢ N. Suppose that there is no u € Z such that p = u>. Then 


VP EQ 


Proof. By Corollary 2.6.8 (1) there is a unique v € Z such that v—1 < \/p < v. By 
hypothesis we know that \/p ¢ Z, and hence v—1 < \/p < v. Lets = v—1. Then 
s€Zands <./p<s-+l. Because p > 1, then \/p > 1, and hence s > I. 
Suppose that ,/p € Q. Then there are a,b € Z with b £0 such that \/p = . By 
Lemma 2.4.12 (2) we can assume that a,b € N. 
Let 2 
E= {dN | there is some ¢ € Z such that VB = <}. 


Clearly E CN. Because ,/p = $, thenb € E, so E #9. By the Well-Ordering Principle 
(Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6) there is some n € E such that n < x 
for all x € E. By the definition of E there is some m € Z such that \/p = 7. Because 
5<.4/p <s-+1, it follows that s < a <s-+1, and hence, with a bit of rearranging, 
we deduce that 0 < m—sn <n. 

It can be verified that (np —sm)? — (m—sn)*p = (s? — p)(m* —n*p) by expanding 
both sides of the equation; the details are left to the reader. Because \/p = “ then 
m> —n*p = 0, and it follows that (np — sm)* — (m—sn)?p =0. Hence \/p = P=". 
Because m,n,s,p € Z then m—sn € Z and np —sm € Z. Because m— sn > 0, then 
m—sn €N. It follows that m—sn € E, which is a contradiction to the fact that n < x 
for all x € E, because m— sn <n. We conclude that VP ¢~Q. 


2 


The following corollary seems intuitively clear, and we now have the tools to 
prove it. 


Corollary 2.6.12. The ordered field Q does not satisfy the Least Upper Bound 
Property. 


Proof. Let 
A={w€Q|w>Oand w? <2}. 


As in the proof of Theorem 2.6.9, we see that 1 € A, which means that A + @, and that 
2 is an upper bound of A, which means that A is bounded above. Suppose that A has a 
least upper bound. Let y = lubA. As the reader can verify, the proof of Theorem 2.6.9 
works in the context of Q, and it follows from that proof that y? = 2. Hence y = V2. 
We then have a contradiction to Theorem 2.6.11 and Exercise 2.4.5, and we deduce 
that A does not have a least upper bound. 
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The set of integers is “discrete” in the sense that each integer is isolated from its 
fellow numbers by a distance of at least 1; see Theorem 2.4.10 for details. By contrast, 
we now show that the set of rational numbers and the set of irrational numbers are 
as “indiscrete” as possible, in that numbers of each of these two types can be found 
arbitrarily close to any given real number. More formally, we will prove that between 
any two real numbers there exists a rational number and an irrational number. This 
fact about the rational numbers and the irrational numbers, which relies upon Corol- 
lary 2.6.8, and hence ultimately on the Least Upper Bound Property, is known as the 
“density of the rational numbers” and the “density of the irrational numbers.” 


Theorem 2.6.13. Let a,b € R. Suppose that a < b. 


1. There is some q € Q such thata<q <b. 
2. There is some r € R—Q such thata<r<b. 


Proof. 


(1) We know by hypothesis that b—a > 0. By Corollary 2.6.8 (2) there is some 
n € N such that i < b—a. It follows that an+ 1 < bn. By Corollary 2.6.8 (1) there 
is some m € Z such that m—1 < an < m. It follows that m < an+1 < bn. Hence 
an <m < bn, and therefore a < 7 <b. By the definition of Q, we know that . EQ. 


(2) Because 2 > 0, then B > 0, and it follows that 4 wa B By Part (1) of this 


theorem we know that there is some g € Q such that oo <q< 5. We can choose q 
so that it is not 0 (otherwise, if g were 0, then we could pick another rational number 
between g and 0, and use that rational number). Then a < gV/2 < b. Because q £0, it 
follows from Exercise 2.4.6 (2) that qv2 ER-Q. 


We conclude this section with a result known as the Heine—Borel Theorem, which 
might appear to be a somewhat unmotivated technicality, but which will be very useful 
in the proofs of two important theorems (Theorem 3.4.4 and Theorem 5.8.5). Often 
the name “Heine—Borel Theorem” is used to refer to a more general result than what 
is stated below, but what we have here is the core of the more general version. It would 
take us too far afield to motivate the Heine—Borel Theorem, other than to say that 
the combination of closedness and boundedness for intervals in R is so powerful that 
it makes closed bounded intervals behave somewhat analogously to finite sets. For 
the present, the reader should view the Heine—Borel Theorem as simply a technical 
necessity, though the reader is encouraged to learn more about this theorem, and the 
concepts involved. This topic may be found in any book on point set topology, as part 
of the discussion of the concept of “compactness”; see [Mun00, Sections 26 and 27] 
for details. 


Theorem 2.6.14 (Heine—Borel Theorem). Let C C R be a closed bounded interval, 
let I be a non-empty set and let {A;},<,; be a family of open intervals in R. Suppose 
that C © UjerAi. Then there aren € N and ij,i2,...,in € I such that C C Up_1 Ai, 


Proof. Let C = [a,b]. The result is trivial if a = b, so suppose that a < b. 
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Let 


Cr 


S = {r € [a,b] | there are p € N and i1,...,i,) € J such that [a,r] C |) Ai, }- 


k=1 


Clearly S C [a,b]. It is evident that a € S, and therefore S 4 0. The set S is bounded 
above by b. The Least Upper Bound Property implies that S has a least upper bound. 
Let z= lubS. Thena<z<b. 

Because z € [a,b], there is some m € / such that z € A,,. The set A,, is an open 
interval, and hence Lemma 2.3.7 (2) implies that there is some 6 > 0 such that 
[z—6,z+ 6] CAm. 

We now show that z € S. If z= a, then we already know that z € S$. Now suppose 
that z > a. By definition z = lub S, and so Lemma 2.6.5 (1) implies that there is some 
w € S such that z— 6 < w < z. It follows that [w,z] C Am. By the definition of S there 
are p € Nand ij,i2,...,ip € J such that [a,w] C Up_, Ai,- Then [a, z] = [a, w] U[w,z] C 

P Ai, UAm- Hence z € S. 

Next, we show that z = b. Assume that z 4 b. Then z < b. Let 7 = min{6, pez. 
Then 7 > 0. Observe that [z,z+7] C Am, and that z+ 7 € [a,b]. Then [a,z+ 7] = 
[a,z]U [z,z+ 7] CUP, Ai, UAm. It follows that z+ 17 € S, which is a contradiction to 
the fact that z = lub S. We deduce that z = b. Because z € S, we conclude that b € S, 
which completes the proof. 


The Heine—Borel Theorem (Theorem 2.6.14) is about closed bounded intervals 
only. As the reader is asked to show in Exercise 2.6.17, the statement analogous to 
this theorem, but with an interval that is bounded and either open or half-open, or 
with an interval that is unbounded and closed, is not true. 

The proof of the Heine—Borel Theorem (Theorem 2.6.14) relies upon the Least 
Upper Bound Property. More strongly, it turns out that the Heine—Borel Theorem 
is equivalent to the Least Upper Bound Property, as is discussed in Section 3.5 and 
proved in Theorem 3.5.4. 


Reflections 


Some parts of this section appear to be complicated proofs of obvious facts, for 
example the Archimedean Property and the existence of \/2, and other parts appear 
to be proofs of unmotivated technicalities, for example the No Gap Lemma and 
the Heine—Borel Theorem. In fact, both types of proofs are quite typical of what is 
encountered throughout this text, and throughout much of mathematics. Virtually all 
the main theorems encountered in this text ultimately rely upon the Least Upper Bound 
Property. Some of those theorems, for example the Intermediate Value Theorem, have 
intuitively obvious statements, but have complicated proofs that rely directly upon 
the Least Upper Bound Property, similarly to some parts of the present section. Other 
theorems in the book, for example Lebesgue’s Theorem, have complicated proofs that 
rely upon unmotivated technical results such as the Heine—Borel Theorem (which in 
turn relies upon the Least Upper Bound Property). So, in both content and style, the 
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material of the present section is at the heart of what we will encounter throughout 
this text. 


Exercises 


Exercise 2.6.1. [Used in Exercise 2.6.11, Theorem 2.7.1, Lemma 5.4.6, Theorem 5.9.9 
and Theorem 5.8.5.] Let A,B C R be non-empty sets. Suppose that A C B. 


(1) Suppose that B has a least upper bound. Prove that A has a least upper bound, 
and lubA < lubB. 

(2) Suppose that B has a greatest lower bound. Prove that A has a greatest lower 
bound, and glbB < glbA. 


Exercise 2.6.2. [Used in Lemma 3.5.3, Theorem 5.4.11, Example 5.9.7 and Exer- 
cise 5.9.1.] Let A C R be a non-empty set. Refer to Example 2.5.13 for the definition 
of a greatest element of a set; the definition and properties of a least element of a set 
are completely analogous. 


(1) Prove that if a € A is a greatest element of A, then A has a least upper bound 
and lubA = a. 

(2) Prove that if c € R is an upper bound of A and c € A, then A has a least upper 
bound and c = lubA. 

(3) Prove that if b € A is a least element of A, then A has a greatest lower bound 
and glbA = b. 

(4) Prove that if d € R is a lower bound of A and c € A, then A has a greatest 
lower bound and d = glbA. 


Exercise 2.6.3. [Used in Theorem 2.8.10, Theorem 5.9.9, Theorem 5.9.10 and Sec- 
tion 5.9.] Let A,B C R be non-empty sets. 


(1) Suppose that A and B are bounded above, and that for each b € B and € > 0, 
there is some a € A such that b — € < a. Prove that lub B < lubA. In particular, 
deduce that if for each b € B, there is some a € A such that b < a, then 
lubB < lubA. 

(2) Suppose that A and B are bounded below, and that for each b € B and € > 0, 
there is some a € A such that a < b+ €. Prove that glbA < glbB. In particular, 
deduce that if for each b € B, there is some a € A such that a < b, then 
glbA < glbB. 


Exercise 2.6.4. [Used in Theorem 3.5.3.] Let A C R be a non-empty set. Suppose that 
A is bounded above. Let 


U = {x €R| x is an upper bound of A}, 


and let L = R—U. Prove that if x € Land y € U, thenx < y. 


Exercise 2.6.5. [Used in Exercise 7.4.1.] Let A C R be a non-empty set. Let 


-A = {—x|x€ A}. 
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(1) Prove that —A is bounded below if and only if A is bounded above. 

(2) Prove that —A is bounded above if and only if A is bounded below. 

(3) Prove that —A has a greatest lower bound if and only if A has a least upper 
bound, and that if A has a least upper bound then glb(—A) = —lubA. 

(4) Prove that —A has a least upper bound if and only if A has a greatest lower 
bound, and that if A has a greatest lower bound then lub(—A) = — gIbA. 

(5) Use previous parts of this exercise to provide an alternative proof of the 
Greatest Lower Bound Property (Theorem 2.6.4). 


Exercise 2.6.6. [Used throughout.] Let A C R be a non-empty set, and let b € R be 
an upper bound of A. Suppose that for each € > 0, there is some a € A such that 
b—€e <a. Prove that b = lubA. (In practice, it is often more convenient to show that 
b—a < € rather than the equivalent b — € < a, but we stated the exercise as we did to 
make it analogous to Lemma 2.6.5 (1).) 


Exercise 2.6.7. [Used in Theorem 2.7.1.] Let b € R. Let D = {x € Q| x <b}. Prove 
that D has a least upper bound, and that lub D = b. (This fact seems trivial intuitively, 
but a proof is needed.) [Use Exercise 2.6.6.] 


Exercise 2.6.8. [Used in Example 5.9.7.] Let A,B C R be non-empty sets, and let 
p €R. Suppose that if a € A and b € B, then a < b. Suppose that for each € > 0, there 
area € A andb € Bsuch that p—e <a<p<b<p+e. Prove that A has a least upper 
bound and B has a greatest lower bound, and p=lubA=glbB. [Use Exercise 2.6.6.] 


Exercise 2.6.9. [Used in Exercise 2.6.10, Theorem 2.7.1, Exercise 2.8.5, Exercise 5.4.16 
and Theorem 5.9.10.] Let A,B C R be non-empty sets. Let A+ B and AB be defined 
by 


A+B={at+b|acAandbeB} and AB={ab|acAandbeB}. 


(1) For each z € R, let C, = {y € Q| y < z}. Let x,y € R. Prove that C.,,) = 
C.+C,y. 

(2) For each z € (0,00), let C, = {y € Q| 0 <y < z}. Let x,y € (0,0). Prove that 
Cy = Ce. 

(3) Prove that if A and B have least upper bounds, then A + B has a least upper 
bound, and lub(A + B) = lubA + lubB. 

(4) Prove that if A and B have greatest lower bounds, then A+ B has a greatest 
lower bound, and glb(A + B) = glbA + glbB. 

(5) Suppose that A,B C (0,°°). Prove that if A and B have least upper bounds, 
then AB has a least upper bound, and lubAB = (lubA) (lub B). 

You may use, without proof, the fact that (lub A) (lub B) — [lub A + lub B]x + 

x* can be made as close as desired to (lub A) (lubB) by a suitable choice of 

positive x; this fact will follow from the continuity of polynomials, discussed 

in Example 3.3.7 (1). 


Exercise 2.6.10. [Used in Exercise 2.6.11, Exercise 5.4.8 and Theorem 5.5.1.] Let A C 
R be a non-empty set. Suppose that A has a least upper bound and a greatest lower 
bound. 
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(1) Prove that glbA < lubA. 
(2) Let x,y € A. Prove that |x — y| < lubA — glbA. 
(3) Prove that 
lub{|x—y| | x,y € A} = lubA — glbA. 


[Use Exercise 2.6.5 and Exercise 2.6.9.] 
(4) Prove that glbA = lubA if and only if A has a single element. 


Exercise 2.6.11. [Used in Exercise 3.2.18.] Let (a,b) C R be a non-degenerate open 
bounded interval, and let {Dx} xe(a,b) be a family of subsets of R. Suppose that D,. 
is non-empty and bounded for all x € (a,b), and that s,t € (a,b) and s < t imply 
Ds © D,. For each s € (a,b) let as = glbD, and b; = lubD,. Let A = {as | s € (a,b)} 
and B = {b, | s € (a,b)}. Prove that A has a least upper bound and B has a greatest 
lower bound, and that lubA < glbB. [Use Exercise 2.6.1 and Exercise 2.6.10 (1).] 


Exercise 2.6.12. [Used in Lemma 7.2.4.] Let a,b € R. Suppose that a > 0. Prove that 
there is some 1 € N such that b € [—na,nal. 


Exercise 2.6.13. [Used in Lemma 2.8.5, Theorem 2.8.6, Theorem 2.8.10 and Exer- 
cise 8.4.11.] Let p ¢ N. Suppose that p > 1. Let x € (0,00). Prove that there is some 
n € N such that mr <x. [Use Exercise 2.5.13 (2).] 


Exercise 2.6.14. [Used in Lemma 7.3.2, Section 7.3, Lemma 7.3.4, Exercise 7.3.2 and 
Lemma 10.5.1.] Let a,h € R. Suppose that h > 0. 


(1) Let x € R. Prove that there is a unique n € Z such that a+ (n—1)h<x< 
a-+nh. 

(2) Let x,y € R. Suppose that there is no n € Z such that a+ nh is strictly between 
x and y. Prove that |x —y| <h. 


Exercise 2.6.15. Let A C Z be a set. Suppose that A has a least upper bound. Prove 
that lubA € Z. 


Exercise 2.6.16. [Used in Example 2.6.1.] Prove that N is not bounded above as a 
subset of R. 


Exercise 2.6.17. [Used in Section 2.6.] Give examples to show that the statement 
analogous to the Heine—Borel Theorem (Theorem 2.6.14), but with an interval C that 
is bounded and either open or half-open, or that is unbounded and closed, is not true. 


Exercise 2.6.18. [Used in Theorem 2.6.9.] Fill in the missing details in the proof of 
Theorem 2.6.9. That is, prove that x exists when 0 < p < 1, and prove uniqueness for 
all p. 


2.7 Uniqueness of the Real Numbers 
We have seen two approaches to the existence of the real numbers: In Chapter | the 


real numbers were constructed from the rational numbers, and in the present chapter 
the real numbers were taken axiomatically. In neither approach, however, was the 
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uniqueness of the real numbers discussed. We now show that the set of real numbers, 
no matter how its existence is arrived at, is in fact unique. 

Before we can prove that the set of real numbers is unique, we need to know 
what “uniqueness” means in this context. In the modern mathematical approach, 
what we care about in regard to the set of real numbers, or any other mathematical 
object, is how the object behaves, not “what it is.’ The set of real numbers, however 
conceptualized, behaves according to the properties of an ordered field that satisfies 
the Least Upper Bound Property; that is what we hypothesized in Axiom 2.2.4 when 
we took the real numbers axiomatically, and that is what we proved in Theorem 1.7.6 
and Theorem 1.7.9 when we constructed the real numbers from the rational numbers. 
To say that the real numbers are unique is to say that any two ordered fields that satisfy 
the Least Upper Bound Property behave the same way no matter how these ordered 
fields were defined. Given that the axiom for an ordered field and the definition 
of the Least Upper Bound Property are stated in terms of two binary operations 
(called “addition” and “multiplication’”’) and a relation (called “less than’), to say 
that two ordered fields that satisfy the Least Upper Bound Property behave the same 
way means that the binary operations addition and multiplication, and the relation 
less than, in one of the ordered fields correspond exactly to the binary operations 
addition and multiplication, and the relation less than, in the other ordered field. Such 
a correspondence is achieved via a bijective function that “preserves” the two binary 
operations and the relation, as stated in the following theorem. For the reader who is 
familiar with rings and fields, we note that the type of function we want is called an 
order preserving ring isomorphism. 


Theorem 2.7.1 (Uniqueness of the Real Numbers). Let R; and Ro be ordered fields 
that satisfy the Least Upper Bound Property. Then there is a function f: R, — R2 
that is bijective, and that satisfies the following properties. Let x,y € Rj. 


a. f(xty)=f(x) +f). 
b. f(xy) = f(x) FO). 
c. Ifx <y, then f(x) < f(y). 


It is important to note in the statement of Theorem 2.7.1 that the symbols “+,” “: 
and “<” are used in two different contexts, and it is necessary to keep track of what 
these symbols mean at all times. For example, when we write “f(x+y) = f(x) + f(y),” 
the expression “x+y” denotes addition in R1, whereas “f (x) + f(y)” denotes addition 
in Rp. It would be proper to write “+,” and “+2” respectively to denote the addition 
operations in each of R and Ro, but doing so would make things very difficult to read, 
and so we prefer to write “++” to mean both addition operations, with the assumption 
that everything will be clear from context, and similarly for multiplication and less 
than. Additionally, we will use the same notation 0 and | in both R; and Ro. 

The idea of the proof of Theorem 2.7.1 is as follows. Given that R; and R2 both 
satisfy the hypotheses of Axiom 2.2.4, it follows that everything that was proved 
about R in this chapter prior to the current section also holds for each of R; and Ro. 
In particular, there are analogs of N, Z and Q in each of R, and R2, which we will 
denote N;, Z;, Q1, and No, Z2, Q2, respectively. In the proof of Theorem 2.7.1, we 
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will use the analogs for Nj, Z;, Q1, N2, Zz and Q> of the theorems and exercises we 
have proved for N, Z and Q. We then define the function f in stages, first on N, then 
on Z;, then on Q) and finally on R;. This definition by stages makes the proof of the 
theorem somewhat lengthier than might be expected. 

In the proof we will use the concept of the extension of a function. To remind the 
reader of this concept, let A and B be sets, let S C A be a subset and let f: S— Bbea 
function. An extension of f to A is any function h: A — B such that h|s; = f. 


Proof of Theorem 2.7.1. We follow [Spi67, Chapter 29] and [Pow94, Appendix] for 
parts of this proof. 


Step 1 Using Definition by Recursion (Theorem 2.5.5) we see that there are 
functions g: Nj — N2 and p: Nz — N, such that g(1) = 1, and g(n+1) = g(n)+1 
for all n € Mj, and that p(1) = 1, and p(m+ 1) = p(m) +1 for all m € No. Then 
(pog)(1) = p(g(1)) = p(1) = 1, and (pog)(n+1) = p(g(n+1)) = p(gin) + I) = 
P(g(n)) +1 = (pog)(n) +1 for all n € Nj. Observe that Ly, (1) = 1, and Ly, (n+1) = 
n+1= 1y,(n)+1 for all n € Ni, where 1y, : Nj — N, is the identity map. If follows 
from the uniqueness in Theorem 2.5.5 that po g = 1y,. A similar argument shows 
that go p = 1y,. We deduce that g and p are inverses of each other, and hence each is 
bijective. 

We now prove by induction on 7 that g(m-+n) = g(m) + g9(n) for all m,n € Nj. 
First, suppose that n = 1. Let a € N). Then g(a+n) = g(a+1) = g(a) +1=2(a)+ 
g(1) = g(a) + g(n). Now let n € N,, and suppose that g(m +7) = g(m) 4+ g(n) for all 
m€ N. Let b EN. Then g(b+ (n+ 1)) = g((b+n) +1) = g(b+n)+1=8(b)+ 
g(n) +1 = g(b) + 8(n+1). Hence, by induction on n, we deduce that g(m +n) = 
g(m) + g(n) for all m € N,, for all n € Nj. Similar proofs can be used to show that 
g(mn) = g(m)g(n) for all m,n € Nj, and that if m <n then g(m) < g(n) for all 
m,n € Nj; the details are left to the reader in Exercise 2.7.2. 


Step 2 Let h: Z; — Zp be defined by h(n) = g(n) for all n € M1, by h(0) = 0 and 
by h(n) = —g(—n) for all n € —N. By definition / is an extension of g to Z;. Observe 
that for n € Z;, it follows from the definition of h that h(n) € N> if and only if n € Nj, 
that h(n) = 0 if and only if n = 0, and that h(n) € —N2 if and only n € —Nj. 

It is left to the reader in Exercise 2.7.3 to show that h(m-+n) = h(m)+A(n) for all 
m,n € Zj, that h(mn) = h(m)h(n) for all m,n € Z,, and that ifm <n then h(m) < h(n) 
for all m,n € Z,. 

Because m <n implies h(m) < h(n) for all m,n € Z, it follows that h is injective. 

To show that / is surjective, let b € Z). If b € No, then because g is surjective 
there is some a € Nj such that g(a) = b, and hence h(a) = b. If b = 0, then h(0) = b. 
If b € —N, then —b € No, and hence there is some c € N; such that g(c) = —b, 
which implies that h(—c) = —g(—(—c)) = —g(c) = —(—b) = b. We deduce that h is 
surjective, and hence h is bijective. 


Step 3 Let k: Q; — Q> be defined as follows. Let x € Q). Then x = $ for some 
a,b € Z, such that b 4 0. It follows from the construction of h in Step 2 that h(b) 4 0. 


Then we let k(x) = a To show that this definition makes sense, suppose that 5 = } 


for some c,d,s,t € Z, such that d ~ 0 and t # 0. It then follows that ct = sd, and 
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hence by Step 2 we see that h(c)h(t) = h(ct) = h(sd) = h(s)h(d), which then implies 


that ann — an. Hence k is well-defined. 


Let n € Z,. Then n = , and hence k(n) = Cage i 
an extension of h to Q). 

Let x,y € Q;. Then x = ¢ and y= § for some a,b,c,d € Z, such that b #0 and 
d #0. We know that x+y = ate and xy = #4, and hence we can use Step 2 to 
deduce that 


h(n). It follows that k is 


h(ad+bc) h(a)h(d)+h(b)h(c) h(a). A(c) 
ea h(bd) h(b)h(d) = h(b) * Ald) la) + Ry) 


and 
_h(ac) _ h(a)h(c) _ h(a) Ac) 


k{ay) = h(bd)  h(byh(d) h(b) h(d) k{x)(y). 


Suppose that x < y. Then ¢ < 4. It follows from Exercise 2.4.9 that either ch —ad > 0 
and bd > 0, or cb—ad < 0 and bd < 0. If cd —ad > 0 and bd > 0, then by Step 2 
we see that h(cb—ad) > h(0) and h(bd) > h(0), which implies that h(c)h(b) — 
h(a)h(d) > 0 and h(b)h(d) > 0; if ch —ad < 0 and bd < 0, it follows similarly that 
h(c)h(b) — h(a)h(d) < 0 and h(b)h(d) < 0. By Exercise 2.4.9 again we deduce from 
both cases that ne < — which implies that k(x) < k(y). 

Because x < y implies k(x) < k(y) for all x,y € Qy, it follows that k is injective. It 
is left to the reader in Exercise 2.7.4 to show that k is surjective. 


Step 4 We start with two observations. First, let z,w € R2. Suppose that z < w. By 
Theorem 2.6.13 (1) there is some tf € Q>2 such that z <t < w. By Step 3 the function 
k is bijective, and so there is some s € Q such that k(s) =t. Hence z < k(s) < w. 
Second, let x,y € Q1. We know by Step 3 that if x < y, then k(x) < k(y). Conversely, 
if k(x) < k(y), then it must be the case that x < y, because if x ¢ y, then x =yory <x, 
in which case k(x) = k(y) or k(y) < k(x), which would mean that k(x) ¢ k(y). 

Let f: Rj — Ro be defined as follows. Let x € R). Let C, = {y € Qi |y <x}. By 
Theorem 2.6.13 (1) there is some g € Q such that x— 1 < q < x, and therefore C, 4 0. 
Hence k(C,) 4 0. By Corollary 2.6.8 (1) there is some n € Z; such that x < n. Hence 
nis an upper bound of C,. Let u € k(C,). Then u = k(v) for some v € Cy. Therefore 
v <x <n. It follows from Step 3 that u = k(v) < k(n). Hence k(n) is an upper bound 
of k(C,). By the Least Upper Bound Property the set k(C,) has a least upper bound. 
We then let f(x) = lubk(C,). 

Let w € Q). Let zE k(C,,). Then z = k(p) for some p € C,. Therefore p < w. It 
follows from Step 3 that z = k(p) < k(w). Hence k(w) is an upper bound of k(C,,), 
which means that lubk(C,,) < k(w). Suppose that lubk(C,,) < k(w). By the first 
observation made at the start of this step of the proof, there is some s € Q, such that 
lubk(C,,) < k(s) < k(w). By the second observation, we see that s < w. Hence s € C,,, 
which implies that k(s) € k(C,,). Hence k(s) < lubk(C,,), which is a contradiction. 
We deduce that lubk(C,,) = k(w), and it follows that f(w) = k(w). Therefore f is an 
extension of k to Ry. 
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Let x,y € R. In Exercise 2.6.9 we defined A + B and AB for any two sets A,B C 
R. By Part (1) of that exercise we know that C,., = C,-+Cy. We then see that 
k(Cy4y) = k(Cy + Cy) = k(Cy) +k(Cy), where the last equality can be deduced from 
Step 3; the details are left to the reader. It then follows from Exercise 2.6.9 (3) that 
f(xy) =lubk(Cyy) = lub[k(C,) +k(C))] = lubk(C,) + lubk(Cy) = f(®) + 0). 


Let u € R;. Suppose that u > 0. Let C, by C, = {y € Q; |0 <y <u}. As before, 
the Least Upper Bound Property implies that the set k(C,,) has a least upper bound; 
the details are left to the reader. Moreover, because u > 0, it can be verified that 
lubk(C,,) = lubk(C,,) = f(u); again, the details are left to the reader. 

We now show that f(xy) = f(x) f(y). There are five cases. 

First, suppose that x = 0 or y = 0. Without loss of generality, assume that y = 0. 
Because 0 € Z), then f(0) = k(0) = A(0) = 0. Then f(x-0) = f(0) =0= f(x)-0 
f(x) f (0). 

Second, suppose that x > 0 and y > 0. By using Exercise 2.6.9 (2), and a similar 
argument as before, it is seen that f (xy) = lubk(C,,) = lub[k(C,)k(C,)] = [lubk(C,)]- 
[lubk(C,)] = f(x) f(y); the details are left to the reader. 

Third, suppose that x < 0 and y > 0. Then —x > 0. Using Exercise 2.7.5 (2) we 
then see that f(xy) = f(—(—a)y) = —f((—a)y) = -F(-x) 0) = £0) FO. 

There are two other cases, which are when x > 0 and y < 0, and when x < 0 and 
y <0; they are similar to the previous case, and we omit the details. 

Suppose that x < y. Then C, C Cy, and hence k(C,) C k(C,). It follows from 
Exercise 2.6.1 (1) that lubk(C,) < lubk(C,). By Theorem 2.6.13 (1) there is some 
a € Q; such that x < a < y, and by the same theorem again we see that there is some 
b € Q; such thatx <a<b<y. Let w € C,. Then w <a <b, and by Step 3 we see 
that k(w) < k(a) < k(b). Therefore k(a) is an upper bound of k(C,), and we deduce 
that f(x) = lubk(C,) < k(a). Because b € Cy, then k(b) € k(C,), which implies that 
k(b) < lubk(C,) = f(y). We deduce that f(x) < k(a) < k(b) < f(y). 

Because x < y implies f(x) < f(y) for all x,y € Rj, it follows that f is injective. 

We now show that f is surjective. Let b € Ro. If b € Qo, then b = k(c) for some 
c € Q), and hence b = f(c). Now, suppose that b € Ro — Q. Let D= {y € Qo | y < D}. 
By Exercise 2.6.7 we know that lub D = b. By a similar argument to that used about 
sets of the form C,, we see that D 4 0, and that there is some m € Zp that is an upper 
bound of D. Because k is bijective, it has an inverse function k-!. Let E =k"! (D), 
and let p = k~!(m). Observe that E C Q, and p € Z;. Then E ¥ 0, and, by using the 
second observation made at the start of this step of the proof, we see that p is an upper 
bound of FE’. By the Least Upper Bound Property we know that the set EF has a least 
upper bound. Let a = lubE. 

Suppose that a € Q;. Then k(a) € Q». It cannot be that k(a) = b, because b ¢ Qo. 
First, suppose that k(a) < b. Then, by the first observation made at the start of this 
step of the proof, there is some w € Q, such that k(a) < k(w) < b. Hence a < w by 
the second observation. Moreover, because k(w) € Qo, then k(w) € D, and therefore 
w € E, which is a contradiction to the fact that a is an upper bound of E. Second, 
suppose that b < k(a). Then there is some q € Q; such that b < k(q) < k(a). Hence 
q<a.Letr € E. Then k(r) € D, so k(r) <b < k(q), and therefore r < q. It follows 
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that g is an upper bound of F, which is a contradiction to the fact that a = lub E. We 
conclude that a ¢ Q). 

We will show that E = Cg, and it will then follow that f(a) = lubk(C,) = 
lubk(E) = lubD = b. Let s € E. Then s < lubE =a. Observe that s 4 a, because 
s © Q, anda ¢ Q). Hence s < a, and therefore s € Cy. Let t € Cy. Then t € Q; and 
t <a. Because t < a= lubE, then ¢ is not an upper bound of E, so there is some 
v € E such that t < v. Then k(t) < k(v). Because k(v) € D, it follows that k(v) < b, 
and therefore k(t) < b. Because k(t) € Qo, then k(t) € D, which implies t € E. We 
conclude that E = Cy, and hence we have proved that f is surjective. 


Reflections 


The proof in this section, though long, has no real surprises. The main lesson to be 
learned from this proof is not the details (though they are worth knowing), but rather 
the fact that if a proof this long is needed, then the result proved should not be taken 
for granted. That is, we were right to raise the question of the uniqueness of the real 
numbers. The reader is encouraged to ask similar questions—concerning uniqueness 
and other issues as well—about all new mathematical concepts she encounters. 


Exercises 


Exercise 2.7.1. Find an example of a set that satisfies all the axioms of an ordered 
field except for the Inverses Law for Multiplication, and that satisfies the Least Upper 
Bound Property. 


Exercise 2.7.2. [Used in Theorem 2.7.1.] Complete the missing parts of Step 1 of the 
proof of Theorem 2.7.1. That is, let m,n € Nj, and prove that g(mn) = g(m)g(n), and 
that if m <n then g(m) < g(n). [Use Exercise 2.4.1.] 


Exercise 2.7.3. [Used in Theorem 2.7.1.] Complete the missing parts of Step 2 of the 
proof of Theorem 2.7.1. That is, let m,n € Z), and prove that h(m+n) = h(m) +h(n), 
that h(mn) = h(m)h(n), and that if m <n then h(m) < h(n). 


Exercise 2.7.4. [Used in Theorem 2.7.1.] Complete the missing part of Step 3 of the 
proof of Theorem 2.7.1. That is, prove that k is surjective. 


Exercise 2.7.5. [Used in Theorem 2.7.1.] Let R; and R2 be ordered fields that satisfy 
the Least Upper Bound Property, and let p: Rj — R2 be a function. Suppose that 


p(x+y) = p(x) + p(y) for all x,y € R1. 


(1) Prove that p(0) = 0. 
(2) Prove that p(—x) = —p(x) for all x € Rj. 


Exercise 2.7.6. Let R; and R2 be ordered fields that satisfy the Least Upper Bound 
Property. Prove that there is a unique function f: R; — R2 that satisfies the properties 
stated in Theorem 2.7.1. 
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It is very important to distinguish between the real numbers per se and the way we 
write them. For example, we know that | € R, and hence 1+ 1 € R. We standardly 
denote the real number | + | by the symbol 2. We then denote the number 1 + 1+ 1 
by the symbol 3. We could, in principle, denote each of the numbers 


pee ee, 


by a single, distinct symbol. Of course, we would need infinitely many distinct 
symbols to do so, and that would not be very convenient for practical calculations. To 
overcome this problem, a number of systems of writing numbers have been developed 
by various cultures throughout history, for example Roman numerals. Today, the most 
commonly used system for writing numbers is base 10 notation, also called decimal 
notation, which allows us to designate all real numbers using only ten distinct symbols, 
namely, the symbols 0,1,2,...,9, though we write these symbols in infinitely many 
different combinations. For example, we denote 1+1+1+1+1+1+1+1+1 by 
the symbol 9, and we denote 1+ 1+1+1+1+1+1+1+41+1 by the symbol 10. 
The base 10 system of writing numbers has proved to be very convenient, though 
we note that the use of the number 10, as opposed to some other number as the base, 
is quite arbitrary, where the number 10 was presumably chosen simply because we 
human beings have that many fingers and toes; there is no particular mathematical 
advantage to the use of the number 10. 

We are so used to thinking of the real numbers in terms of how we write them 
that we often take our system of writing numbers for granted, though in fact there 
are two very substantial questions that need to be asked about any system for writing 
numbers: Can every real number can be represented in the given system, and if yes, is 
the representation unique? It turns out, as we will see in Theorem 2.8.6 below, that 
every real number can in fact be written in the base 10 system, though it should not 
be taken as obvious. As for uniqueness in the base 10 system, it is almost true, with 
the exception of decimals that eventually become the number 9 repeating. Because 
there is nothing special about base 10, we will prove our results in the more general 
context of base p, where p is any natural number greater than 1. 

The formal definition of the base p representation of a real number will be given 
in Definition 2.8.7 below, after we prove some preliminary results. Although the use 
of base 10 notation is something that the reader learned at a very early age, it will 
turn out that a surprisingly large amount of effort is needed to formulate and prove 
that everything works out as expected—we will use the Well-Ordering Principle, the 
Least Upper Bound Property and the Archimedean Property. 

Base 10 notation for the real numbers makes use of powers of 10. For example, if 
we write 235 in base 10 notation, we mean the number 2-10? +3-10+5-1. Similarly, 
the base p representation of the real numbers makes use of powers of p, and we will 
therefore need to make use of Definition 2.5.6 and Definition 2.5.8. 

Our development of the base p representation of the real numbers will have two 
stages: first we deal with the natural numbers, and only after that will we turn to real 
numbers that are not natural numbers. Also, we will restrict our attention to positive 
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numbers, because the base p representation of a negative number is just the negative 
of the base p representation of its absolute value, and the base p representation of 
zero is just zero. 

The following simple lemma is the essence of what makes the base p representa- 
tion of natural numbers work. 


Lemma 2.8.1. Let p € N. Suppose that p > 1. Letn € N. Then there is a unique 
k EN such that p! <n < p*. 


Proof. Let 
G={cEN|n<p*‘}. 


Clearly G C N. By Exercise 2.5.13 (2) we know that n < p”, and hence n € G. 
Therefore G F @. By the Well-Ordering Principle (Theorem 1.2.10, Axiom 1.4.4 or 
Theorem 2.4.6), there is some k € G such that k < g for all g € G. Because k E G 
we know that n < pe . By the choice of k, we see that k— 1 ¢ G, which means that 
pel <n. 


We now see that every natural number can be written uniquely in base p notation. 
We note that in the base 10 system, we express any real number in terms of the 
numbers 0,1,2,...,9 arranged in the appropriate place value system. For base p, we 
use the numbers 0,1,2,...,p—1. 

In the proof of the following theorem, and subsequently, if there is a summation 
of the form )°'_,.a;, and if r > s, we take the summation to be zero; doing so allows 
us to avoid special cases. 


Theorem 2.8.2. Let p € N. Suppose that p > 1. Letn € N. Then there are unique 
k EN and ao,qy,...,ax—1 € {0,..., p— 1} such that ay_, #0, and that 


k-1 
n=) ajp'. (2.8.1) 
i=0 


Proof. To prove uniqueness, suppose that there are t,r € N and co,c1,...,¢r-1 € 
{0,...,p —1} and bo, b1,...,b--1 € {0,...,p — 1} such that c;_; 4 0 and b,_; £0, 
and that 
t=1 r=1 
n= Y cp! and n= Y" dj’. 
i=0 j=0 
Without loss of generality, assume that t < r. The above equations involving n then 
imply that 
t—1 _ rol . 
y) (Gi-ci)p' + y? bjp! =.0. 
i=0 jt 
Because co,c1,---,Cr-1 € {0,...,p—1} and bo,by,...,b,-1 € {0,...,p—1}, then 
|bi —ci| < p—1 for alli € {0,...,t—1}, and |bj| < p—1 for all j € {t,...,r—1}. 
It now follows from Exercise 2.8.4 (3) that b; — c; = 0 for alli € {0,...,t—1}, and 
that bj = 0 for all j € {t,...,r— 1}. However, we know by hypothesis that b,_; 4 0, 
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and hence it must be the case that t = r. It then follows that b; = c; for all i € {0,..., 
t — 1}, and the proof of uniqueness is complete. 

We now prove existence by induction on n, where we use the variant of proof by 
induction given in Theorem 2.5.4. First, suppose that n = 1. Thenn = 1- p°. Letk = 1 
and dg = 1, and then Equation 2.8.1 is satisfied when 1 = 1. 

Next, let n € N. Suppose that n > 1, and that the desired result holds for all natural 
numbers less than n. By Lemma 2.8.1 there is some k € N such that p*! <n < p*. 
Let 

S={je 1Oyss<gP— 1} | ip*" <0}. 


Clearly S C {0,...,p — 1}. We know that 1 € S, and hence S £ @. The set S is a finite 
set of real numbers, and hence it has a greatest element, as proved in Example 2.5.13. 
Let ay_1 denote this greatest element. Then ay_, p’—! <n < (ay_, +1)p*!. It follows 
that 0 <n—ay_yp! < pT. 

There are now two cases. First, suppose that n — ax_, p*~! =0. Then n = ag_1p 
We can therefore let ag = a; = --- = ay_2 = 0, and then Equation 2.8.1 is satisfied. 

Second, suppose that 0 < n—a,_,p*—!. Because each of ag_, and p*~! is an 
integer, then so is n — a sa and hence n — a € N. Because ay_; > 0 and 
p*-! > 0, then n — ay_,p*~! <n. We can therefore apply the inductive hypothesis 
to n—ax_,p*—', and we deduce that there are v € N and do,d},...,dy_1 € {0,..., 
p—1} such that d,_; 4 0, and that 


k-1 


v-1 


n—ag-p\ |= ¥ dip’. (2.8.2) 
i=0 


We claim that v < k. Suppose to the contrary that v > k. It then follows that 
dy_jp’—'! > 1- p*~!, and hence aa d;p' > p‘~!, which would contradict the fact 
that n —a,_,p*—! < p*!. Hence v < k. We now let a; = d; for all i € {0,...,v—1}, 
and a; = 0 for alli € {v,...,k —2}. Equation 2.8.1 is then a rearrangement of Equa- 
tion 2.8.2. 


We now turn to the much trickier question of the base p representation of arbitrary 
positive real numbers. When we write a real number in decimal notation, for example 
m = 3.14159..., we have an infinite collection of digits after the decimal points. Such 
an infinite collection of numbers is called a sequence, a concept that we will discuss 
in detail in Chapter 8. For now, it is sufficient to think of a sequence of numbers 
as an infinite list of the form a, ,a2,a3,a4,.... Itis the presence of infinitely many 
numbers that makes the base p representation of arbitrary real numbers much more 
complicated than the representation of natural numbers. 

When we write 7 = 3.14159..., we mean that 

1 4 1 5 9 


T=3+ 75+ 793+ Tos + io + 105 


The difficulty with dealing with such an expression is that it involves adding infinitely 
many numbers at a time. Addition, as we have seen it in the axioms for the real 
numbers, is defined for only two numbers at a time; using Definition by Recursion 
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it is possible to add any finite set of numbers at a time, as seen in Exercise 2.5.19, 
but that approach does not extend to adding infinitely many numbers at a time. An 
infinite sum of the form written for 7 above is called a series, a concept that we 
will discuss in detail in Chapter 9. As the reader will see in that chapter, not every 
series actually has a sum; those that do are called “convergent” series. It turns out 
that every base p representation of a real number, when viewed as a series, is in fact 
convergent, though that is not an obvious fact, and involves ideas that will be seen 
in Chapter 9; see Exercise 9.3.8 (1) for details. However, even though we have not 
yet discussed the convergence of series, we can nonetheless give a rigorous treatment 
of base p representation of arbitrary real numbers in the present section because it 
is possible to replace the use of series with the use of least upper bounds, a concept 
with which we are by now quite familiar. We follow (with added detail) the treatment 
via least upper bounds found in [Gor02, Section 1.3] and [Ros68, Section II.3]. After 
learning some facts about series in Sections 9.2 and 9.3, it will be left to the reader in 
Exercise 9.3.8 and Exercise 9.3.9 to provide simplified proofs of some results of the 
present section; that simplicity is deceptive, however, because it relies upon learning 
some new concepts first. 

We start with the following lemma, which will allow us to make use of the Least 
Upper Bound Property. For convenience, we will sometimes write p~” and at other 


times write rie 


Lemma 2.8.3. Let p © N. Suppose that p > 1. Let ay,a2,a3,... € {0,...,p—1}. 


Then the set . 
{Yap [ne nf 
i=1 


is bounded below by 0 and is bounded above by 1. 


Proof. Letn €N, and let x = Y"_, a;p~'. Because a; > 0 for alli € {1,...,n}, and 
because p > 0, it follows that x > 0. Hence S is bounded below by 0. By Exer- 


cise 2.8.4 (2) we see that x =)", ap! < <4, — 4 < 1. Hence S is bounded above 
by 1. 


Pp pP 


Because of Lemma 2.8.3, we can make the following definition by appealing 
to the Least Upper Bound Property. Although we will use the infinite summation 
notation that is standardly used for series in the following definition, at this point 
we are thinking of this infinite summation strictly as formal notation, and we are not 
actually adding infinitely many things at a time in the following definition. 


Definition 2.8.4. Let p € N. Suppose that p > 1. Let aj,a2,a3,... € {0,...,p—1}. 
The sum Y*., ajp~’ is defined by 


Yap! =b{ Yap nent, A 
i=1 i=l 


The following lemma gives some basic properties of Y*., a;p~'. To obtain an 
intuitive feel for why these facts are true, the reader should think of the case p = 10, 
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and should think of )° ; ajp as the infinite decimal 0.a,a2a3---, although we 
formally introduce that notation only later in Definition 2.8.7. 


Lemma 2.8.5. Let p € N. Suppose that p > 1. Let a),a2,a3,... € {0,...,p— 1}. 
L0< Leap <1. 
2. V7, aip ‘=O ifand only ifa;=0 for alli cN. 
3. Y2,aip! = 1 if and only if a; = p—1 for alliEN. 
4. Letm€N. Suppose that m > 1, and that an—, # p—1. Then 


m—2 
_;, AQm-1+1 
Yar ‘<Low ‘+ pt 1 ? 


where equality holds if and only if a; = p—\ for alli € N such that i > m. 


Proof. We will prove Parts (2) and (4), leaving the rest to the reader in Exercise 2.8.3. 


(2) a a= 0 for all i € N, then )"_,ajp~' = 0 for all n € N, and hence 
Li-1 4ip ' = lub{0} = 0. If a, 40 for some k € N, then vt, ap > 0, and hence 
Ee ap! =lwb{E" , aip-!|n EN} > 0. 

(4) Let 


m—2 Gm-1 +1 


Q= L in? pri and r={Yap"|nen}, 


By definition );" , ajp~' = lubT. We will show that lubT < Q, and that lubT = Q if 
and only if a; = p—1 for all i € N such that i > m. 

Let n € N. First, suppose that n < m—2. Then Y"_, a;p' < are ap ' <Q. 
Second, suppose that n > m—2. Then n > m—1 by Theorem 2.4.10 (1). Using 
Exercise 2.8.4 (2) we see that 


m—2 Om-1 1 1 


n : 
Lae | = Sap i+ pr" 4 ¥ ap s Lap '+ p" it 
i= 


m—1 n 
i=m P P 


<Q. 


Combining these two cases, we see that Q is an upper bound of T. Hence lubT < Q. 

We now show that lub 7 = Q if and only if a; = p—1 for alli € N such that i > m. 
Suppose that there is some r € N such that r > m and a, 4 p— 1. Because a; € {0,..., 
p—1}, it follows that a, < p—2, and hence p—1—a, > 1. 

Let s € N. We will show that Y3_, a;p~' < Q— p~". It will then follow that @—p~" 
is an upper bound of 7, which means that lub T < Q— p~’ < Q, which implies that 
lubT F Q. There are two cases. 

First, suppose that s < r. Using the fact that a; < p—1 for all i € N, together with 
Exercise 2.8.4 (2), we see that 


Ss 


O- pai p'2Q- Yar 
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m—2 1 
= Am— 
ae aiP 7 p" 1 7 y ajp' 


2 
=|Ee tae Oe itl 
iP m—1 
P i=] i=m 


1 1 1 1 
= pm} E =| = 7 (2.8.3) 


It follows that Y3_, a;p-'< Q-—p* 
Second, suppose that s > r. Hence s > r > m. Using reasoning similar to the 
previous case, we see that 


m—2 
_; Aam-1tl 
Q Yap = Y ap = m—1 
i=l P 
m—2 Gin 
| 2 ap + + Paps + ap 
i=1 i=m i=r+1 


1 1 1 —l1 p-l-a 1 1 
2 m—1 | m—1 a 2 r +! r : | r | 
P P P P P pp 


—l-a 1 1 
p p’ “+ oe > 


where the last inequality holds because p— 1 —a, > 1. It follows that )*_, ap i < 
Q-—p". Combining the two cases, we deduce that lubT 4 Q 

Now, suppose that a; = p— 1 for all i € N such that i > m. We saw above that 
Qis an upper bound of 7. Let € > 0. By Exercise 2.6.13 there is some k € N such 
that — < e€. Taking a larger value of k will not change this inequality, and so we may 
assume that k > m. Using the fact that a; = p — | for all i € N such that i > m together 
with the second half of Exercise 2.8.4 0). we see that the same reasoning used in 
Equation 2.8.3 shows that in the present case Q — Ee aip i= i < €. It then follows 
from Exercise 2.6.6 that Q = lubT. 


We now come to our main theorem regarding the base p representation of real 
numbers, which says that every positive real number has such a representation, and 
that such a representation is unique if we avoid representations that eventually become 
the number (p — 1) repeating, which is the analog of the number 9 repeating in the 
decimal system. The idea of the proof of this theorem is not difficult, but the details 
are somewhat lengthy. 


Theorem 2.8.6. Let p € N. Suppose that p > 1. Let x € (0,°). 


I. There are k € N, and bo,by,...,bg¢-1 € {0,...,p—1} and aj,a2,a3... € 
{0,...,p—1}, such that 


k-1 co 
x= Yi bjpi +) ap". (2.8.4) 
j=0 i=l 
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2. It is possible to choose k € N, and bo,by,...,bx-1 € {0,...,p—1}, and 
a1,a2,a3... € {0,...,p —1} in Part (1) of this theorem such that there is 
nom €N such that a; = p —1 for alli € N such that i> m. 

3. If x > 1, then it is possible to choose k € N, and bo,by,...,bg—-1 € {0,..., 
p—1}, and aj,a2,a3... € {0,...,p —1} in Part (1) of this theorem such that 
by_1 #0. If0 <x <1, then it is possible to choose k = 1, and by = 0, and 
4,42,a3... € {0,...,p — 1} in Part (1) of this theorem. 

4. If the conditions of Parts (2) and (3) of this theorem hold, then the numbers 
KEN, and bo, b,...,bx-1 € {0,..., p— 1}, and aj,a7,a3... € {0,...,p—1} 
in Part (1) are unique. 


Proof. We will prove Parts (1) and (4), leaving the rest to the reader in Exercise 2.8.6. 


(1) By Corollary 2.6.8 (1) there is a unique n € N such that n—1 <x <n. Let 
ag =n—1. Hence ap > 0 andayp <x <ap+1. 

We start by defining the numbers k € N and bo,b1,...,b¢_1 € {0,...,p—1}. 
If ap = 0, let k= 1 and bo = 0. Then ag = aa bjp/. Now suppose that ap > 0. 
Then ao € N. We can then apply Theorem 2.8.2 to ap to deduce that there is a 
unique k € N, and unique bo,by,..., 0x1 € {0,...,p — 1} such that by,_; 4 0, and 
that ay = Yh 5 djp!. 

Next, we define the numbers a1,a2,a3,... € {0,...,p—1}. Actually, we use Defi- 
nition by Recursion to define numbers a1,a7,a43,... € {0,...,p — 1} and z1,zZ2,z3,... 
€ R; the numbers z1,z2,z3,... are not of interest per se, but they help us define 
a1,a2,a3,.... (To be precise, we are using Definition by Recursion to define the single 
sequence (a1,21), (42,23), (a3,z3),--. in {0,...,p — 1} x R, though for convenience 
we do not write it that way.) 

First, let z} = x — ap. Then 0 < z,; < 1, and hence 0-p7! <74< pa Let 


8S, ={j €{0,...,p—1} | jp! <z}. 


Then 0 € Sj, and so S; #0. The set Sj is finite, and hence it has a greatest element. Let 
a, denote this greatest element. Because a; € $1, then a,jp! <z. Ifa; 4 p—1, then 


the maximality of a; implies that z; < (a; +1)p~!; if ay = p—1, then z; < pp"! = 
(a; +1)p~!. Hence a;p~! < z < (a; +1)p7!, and therefore 0 < z;} —aip7! <p}. 


Next, let z2 = z; —a,p7!. Then 0 < z < p7!, and hence 0: p~? < z) < pp~?. Let 


So ={j € {0,...,p—1} | jp-? <2}. 


As before, we know that 0 € $2, and so Sz 4 @, and that the finite set Sz has a greatest 
element. Let a2 denote this greatest element. As before, it is seen that az oe <2 < 
(az + 1)p~7; we omit the details. 
Next, let z3 = 22 —aop~* =z; —(ayp7!+app~7). Then 0 < z3 < p~’. Similarly 
to what we did for a, and ap, we can find a3 € {0,...,p — 1} such that a3p > <723< 
(a3+1)p*. 

We continue in this fashion, obtaining numbers z;,z2,73,... € Rand aj,a2,a3,... 
€ {0,...,p — 1} such that for all n €N, it is the case that 
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n 
Znt1=ti—)laip* and O<zn41 <p”. (2.8.5) 
i=] 


Let 
n 
= {Lar |ne nf. 
i=l 
Let n € N. Equation 2.8.5 implies that z; — Yi). , ajp-! = Z41 > O, and therefore 
ye ajp ! < z;. Hence z; is an upper bound of T. 
Let € > 0. By Exercise 2.6.13 there is some m € N such that p~” = on < €. Using 


Equation 2.8.5 again, we see that z}— 7", ajp~' = Zm41 <p” < €. It now follows 
from Exercise 2.6.6 that z; = lubT. Hence z} = 7°, aip‘. 
Because z} = x — do and ay = v5 bp’, we deduce that 


k-1 co 
x=agnta=) bp +} ap" 
j=0 i=l 
(4) Suppose that there are k,u € N, and bo,by,...,bg-1 € {0,...,p—1} and 
C0;C1,-+-,Cu-1 € {0,...,p— 1}, and aj, az,a3... € {0,...,p — 1} and e1,e2,63...€ 
{0,...,p—1} such that 


u-1 


k-1 oo 
x= VY bjp'+ Yap" and x= Ler’ +Der : 
j=0 i=1 = 


Suppose further that there is no m € N such that aj = p — | for all i € N such that 
i >, or that e; = p—1 for all i € N such that i > m. Suppose also that if x > 1, then 
by_, #0, and if 0 <x < 1, then k = 0 and bp = 0. 

From the above hypotheses it follows that 


u-1 


Yo ee = Yer" — Yap (2.8.6) 


We know by Lemma 2.8.5 (1) (3) that each of Y*) e;p~ ‘and Y=, a;p ‘is in the inter- 
val (0,1). It follows that |P*., ep"! — D2, a;p™'| < 1. Equation 2.8.6 then implies 
that |)" byp! — EN9 cyp! 
it now follows from Theorem 2.4.10 (3) that Ye 0ojp) = ee 0CiP!- 

Suppose that x > 1. Then ea bjpi= Lie ‘1 6Cjp! # 0. Therefore ys oo jp! EN 
and Yin ne jj :p/ € N, and the uniqueness in the statement of Theorem 2.8.2 implies that 
k =u, and that bj =c; for all j € {0,...,k—1}. Next, suppose that 0 < x < 1. Then 
by hypothesis k = 0 and bo = 0, and u =O and co = 0, so that k = u and bo = co. 

Because im, bjpl= ue, c;p/, it follows from Equation 2.8.6 that )? ; ajp' = 
Yi, eip '. We now use Exercise 2.8.7 to conclude that a; = e; for alli € N. 


< 1. Each of ae oo ip! and ye 0CiP! are integers, and 


We are now, finally, ready to make the following definition. 
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Definition 2.8.7. Let p € N. Suppose that p > 1. Let x € (0,00). A base p repre- 
sentation of the number x is an expression of the form x = by_| ---b,bo.aya2a3°--, 
where k € N and bo,b1,...,bg¢-1 € {0,...,p —1} and aj,az,a3... € {0,...,p—1} 
are such that 


k-1 oo 
x= Y) dip +) ap". A 
j=0 i=l 


We can now restate Theorem 2.8.6 by saying that any positive real number has a 
base p representation, and that such a representation is unique subject to the conditions 
stated in Parts (2) and (3) of the theorem. 

We conclude this section with the one fundamental issue regarding base p rep- 
resentations that we have not yet addressed, which is characterizing the base p 
representation of rational numbers. It should be familiar to the reader that the decimal 
expansion of a rational number is either terminating or eventually repeating. (Actu- 
ally, this characterization is redundant, because a “terminating” decimal expansion 
is simply one that eventually has repeating zeros, but we will maintain the standard 
phraseology.) The analogous fact holds for the base p representation of rational num- 
bers for all p, as we will show in Theorem 2.8.10 below. First, however, we need to 
state and prove the following theorem, which is known as the “Division Algorithm,” 
although it is not an algorithm, but is rather an existence theorem (the name of the 
theorem is a historical artifact). The Division Algorithm is a very important tool in a 
number of branches of mathematics (for example number theory), though in this text 
we will be using it only in the proof of Theorem 2.8.10, and hence we have included 
it in this section. 

To understand the Division Algorithm, think of how one learns to divide natural 
numbers in elementary school. Suppose that we want to divide 27 by 4. It is seen 
that 4 goes into 27 six times, so that the quotient is 6, and there is a remainder of 3. 
In other words, we write a =6+ 3, which for convenience can also be written as 
27 = 6-4+3. How did we find the quotient and the remainder? The idea is that we 
wanted to find as many whole copies of 4 in 27 as possible, and we see that there 
are 6 copies, because 6-4 is less than 27, but 7-4 is greater than 27. The remainder 
is what was left over when we subtracted 6-4 from 27. As such, we see that the 
remainder must be less than 4, or else we could have increased the quotient. The 
Division Algorithm is just a general statement that in all such situations, there is a 
unique quotient and remainder. 


Theorem 2.8.8 (Division Algorithm). Let a © NU {0} and b EN. Then there are 
unique q,r © NU {0} such that a= bq+rand0<r<b. 


Proof. To prove uniqueness, suppose that there are qg, p,r,s € Z such that a= bq+r 
and a= bp+s, and thatO <r<band0<s <b. There are two cases. 

First, suppose that g = p. Because bg +r = bp +, it follows that r= s. 

Second, suppose that g ¢ p. Without loss of generality, assume that g > p. Then 
q+(—p) > 0, and because q+ (—p) is an integer, it follows that g+(—p) > 1. 
Because bg+r = bp +s, we then have s = b(q+ (—p))+r>b-1+0=b, and this 
inequality contradicts the hypothesis on s. Hence it cannot be the case that g 4 p. 
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We now prove existence. Again, there are two cases. First, suppose that a = 0. Let 
q = 0 and r=0, which yields bg+r=0-q+0=0=a,and0<r<b. 

Second, suppose that a > 0. We prove the desired result by induction on a. Let 
a = 1. There are now two subcases. First, suppose that b = 1. Let g= 1 and r= 0, 
which yields bg+r=1-1+0=1=a,and0<r<b. Second, suppose that b ¥ 1. 
Therefore b > 1. Let g =0 and r = 1, which yields bg+r=b-0+1=1=a, and 
0<r<pb. Hence the result is true when a = 1. 

Now suppose that the result is true for a, and we will prove that it is true for a+ 1. 
By the inductive hypothesis there are g,v € Z such thata =bg+vand0<v<b. Then 
v+1 <b. There are now two subcases. First, suppose that v+ 1 < b. Let g = g and 
r=v+1.Hence bgt+r=bg+(v+1)=(bg+v)+1=a+l,and0<v<v+l= 
r=v+1<b. Second, suppose that v+ 1 = b. Hence v= b— 1. Letg=g+1 and 
r=0. Then bg+r=b(g+1)+0=bg+b=bg+(b—1)4+1= (bg+v)+1l=a4tl, 
and 0 <r <b. By induction we now see that the result is true for all a > 0. 


The final definition and theorem of this section show that a positive real number 
is rational if and only if it has an eventually repeating base p representation for any p. 
Again, the proof is somewhat lengthier than might be expected. 


Definition 2.8.9. Let p € N. Suppose that p > 1. Let x € (0,00), and let x = 
by_1 +++ bi bo.a,a2a3--- be a base p representation of x. This base p representation is 
eventually repeating if there are some r,s € N such that a; = aj+,5 for all j € N such 
that 7 > r; in that case we write 


X = Dg_y +++ by bo.a,a203 +++ Ay, Ap Ay s—1- A 


Theorem 2.8.10. Let p ¢ N. Suppose that p > 1. Let x € (0,0). Then x € Q if and 
only if x has an eventually repeating base p representation. 


Proof. First, suppose that x has an eventually repeating base p representation. 
Hence x = by_1 ---bbo.a,a2a3 +++ d;-—14;*--G;i45_1 for some k,r,s € N and for some 
bo, b1,. ca DK-1,41,42,- +) Arts—-1 © {0, tee P— ve 

Let 


r+s—1 ; u : 
B= y ajp ' and w={ cp" | ue Nandu > rh, 
i=r i=r 


As a preliminary step, we will prove that lubW = a . Let m € N. Using the fact 


that a; = aj+s for all j € N such that j > r, together with Exercise 2.5.12 (3), we see 
that 


s r+ms—1 s m r+ks—1 
pB __ pB 
_ ap! = 
: P 
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p’B mn _ fp oo 
yaa (k-1)s y dip 
k=1 i=r 
eae YB ~(k-1)s__ PB Baia) 
ps-1 fl ps-1 1-—p°s 
B 


= (ps = 1)p(m—1)s : 


Now let u € N. Suppose that u > r. Let g =u—r-+1. Then qg > 1, and hence 
q €N. Because s € N, then s > 1, and hence qs > 1. It follows that u<r+qs—1. 


Then 
rtgs—1 


: aps i a 
which implies that 
SB r+qs—1 B 


p’B u = Pp - 
= np i> _ p i= —— > 90, 
p—1 Lap ~ pl z uP (pe pes = ° 


Therefore Y_,.ajp! < a cs is an upper bound of W. 


Let e > 0. Clearly B> 0. B= sia aaa ines 0 < €. Now 


suppose B > 0. By Exercise 2.6.13 there is some e € N such that gue ae . Let 
w =e-+1. Because es > e, we see that 


B _ &B Ug B ee B (p= Te. 
(ps — 1) ps oa ps —1 p® — p’—1 pe ~ ps—1 B 


Putting these two cases together, and using a previous calculation, it follows that 


p’B rtws—1 , B 
= ap = —— ae 
pe 1 —. (p’ — 1) p\ 1)s 
It follows from Exercise 2.6.6 that aa = lubW, which completes the preliminary 


step. 
If nym € N and n < m, then Y"_,ajp7' < YL", aip™'. It follows from Exer- 
cise 2.6.3 (1) that 


n u 

1d Yap [ne nt = df J aip~ |ue Nandu> rh. 
i=1 i=1 

We now use Exercise 2.8.5 to deduce that 


co n u 
Yap = tub ap [ne nf = tud{ Yap |ue¢ Nandu> r} 
i=l 


i=1 i=1 


= {Far 4) ap jweN and u > rh 


i=1 i=r 
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r—1 u 
= asp" +tub{ Yap jweN and u> rh 
i=r 

p’B 
pl 


r—-1 
=P aip'+lubW = ¥ aip ‘+ 
i=1 


i=1 i= 


Therefore 


k-1 . oo kel _ tol F pB 
= y? bjp! +) ap = y\ bjp! + Yi aip ‘+ s_] 
j=0 i=l j=0 i=1 pe 

k-1 _ tol : p rts—1 ; 
=) bjpi + Vo ap *+— ; yap. 
j=0 i=l Po? j=p 


This last expression is a rational number, being the sum of rational numbers, and 
therefore x € Q. 

Now suppose that x € Q. We will show that x has an eventually repeating base p 
representation. The key idea is that even though we have already seen a method for 
finding a base p representation of x in the proof of Theorem 2.8.6 (1), we will now 
use a different method that works for the special case when x is a rational number, and 
this alternative method will allow us to show that the resulting base p representation 
is eventually repeating. 

Because x € Q and x > 0, we know by Lemma 2.4.12 (2) that x = 4 for some c,d € 
N. We now use Definition by Recursion to define numbers eo, e1,¢2,... € NU {0} and 
ro,ri,r2,... © NU{O}. (As in the proof of Theorem 2.8.6 (1), we are actually using 
Definition by Recursion to define a single sequence in NU {0} x NU {0}, though for 
convenience we do not write it that way.) 

Using the Division Algorithm (Theorem 2.8.8), there are unique e9,7o € NU {0} 
such that c = dey + ro and 0 < ro < d. Using the Division Algorithm again, there are 
unique e;,7; € NU {0} such that pro = de; +r; and 0 <r, <d. Similarly, there are 
unique €2,r2 € NU {0} such that pr) = de2 + r2 and 0 < rp < d. We continue in this 
fashion, obtaining numbers eo, e1,¢2,... € NU {0} and 79,r),r2,... € NU {0} such 
that for all n EN, it is the case that 


Prn-1 =dent+rn and O<r, <d. (2.8.7) 


It follows from Exercise 2.8.2 (1) that e9 > 0, and from Exercise 2.8.2 (2) that 
0<e, <p for alln € N. Hence e, € {0,...,p —1} foralln EN. 


Let 
n 
R= {eo ep" [ne nh. 
i=l 
It follows from Exercise 2.8.5 that R has a least upper bound, and that lubR = 


eo + Y%., e;p_'. We will now show that lubR = §, which will imply that 5 = eo + 
Leip. 
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Let n € N. Then 


ws me c—deg 1 SQ priei—riy_; 
ea eip i] — p" i 
» | d pt d 


d = 
i= 
1 n . 1 n . 
= a {pm ype rey yen} 
P i=l i=l 
1 { ‘3 n 
n n=t n—t 
= Pro P vrit+)p n} 
dp" i=0 i=l 
1 


0 Ve 
= apn P'70 p''ro | Pp Tyb = bp >0. 
Therefore e9 + Y"_, e;p' < 5. We deduce that 5 is an upper bound of R. 
Let € > 0. By Exercise 2.6.13 there is some u € N such that at < €. From the 
above calculation, together with the fact that 7, < d, we now see that 


d 


é eg. at < ; <€E 
0 . _ ee 
=i dp" 


It now follows from Exercise 2.6.6 that 5 = lubR. Hence § = e9 + Y=, e;p 

To find the base p representation of 4, there are two cases. If e9 = 0, then 
5 =i eip ‘| is a base p representation of 4. Now suppose that eo > 0. Hence 
eo € N. We can therefore apply Theorem 2.8.2 to e9 to obtain unique v € N and 
fo, fis---;fv-1 € {0,...,p—1}, such that f,_1 4 0, and that e9 = ne :p'. It fol- 
lows that 5 = v9 fip' + LZ eip” is a base p representation of a 

To complete the proof of this theorem, we will show that the numbers e;,¢2,¢3,... 
are eventually repeating. Because ro,r1,r2,...€ NU{0}, and0 <r, <d foralln EN, 
we see that there are d possible values in NU {0} that each r, can take. Consider 
the d+ 1 numbers 71,72,...,7¢41. Because we have d+ | numbers, and each one of 
these numbers can take on one of d possible values, it follows that at least two of 
the numbers r},r2,...,7g¢+1 are equal to each other. (Formally, we are using a fact 
known as the Pigeonhole Principle, which is discussed and proved in many texts 
on combinatorics, for example [Rob84, Section 8.1]; this principle is really just a 
theorem about maps of finite sets, and does not need any combinatorial ideas for its 
formulation and proof, as seen in [Blo10, Exercise 6.3.17].) Hence r; = 7 for some 
gst € {1,2,...,d+1}, wheres <t. 

Recall that the numbers e9,¢),¢2,... € NU {0} and r9,r1,72,... € NU {0} were 
defined using Definition by Recursion, making use of the Division Algorithm (Theo- 
rem 2.8.8); these numbers satisfy Equation 2.8.7 for all n € N. According to the 
Division Algorithm, if we know the number 7;,_; for some k € N, then the numbers 
ex and rz are uniquely determined. Hence, because r; = 7, then rs) = 7741, and then 
rs42 =1;42 and so on. From this we deduce that the numbers 79,71,/2,... can be 
written as 79,71,..-,1s—1 followed by the numbers r;,7541,...,7—1 repeated. Because 
each number e, is determined uniquely by r;,_; for all k € N, then the numbers 
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€1,€2,€3,... can be written as €;,€2,...,es followed by the numbers e,41,€549,...,e 
repeated. Therefore 5 = ay_j-+-@1d0.€1€2€3°**@s@s+1-*@r—1- 


Reflections 


The goal of this section is to prove some facts about the real numbers that are 
so familiar they are usually taken for granted, and it is rather surprising that the 
proofs in this section are as complicated as they are. The reason for these lengthy and 
technical proofs is because decimal expansions involve infinitely many numbers after 
the decimal point, which means that decimal expansions are a type of infinite sum. In 
contrast to finite sums, infinite sums do not always exist in general, and the existence 
of the particular infinite sums used for decimal expansions requires the Least Upper 
Bound Property. The use of this property explains why none of the familiar facts about 
decimal expansions are proved rigorously when students first learn about decimal 
expansions in elementary school, or even subsequently in high school or college 
calculus courses. 

Given that decimal expansions are infinite sums, there is a slightly easier way 
to prove some of the results in the present section than we have seen here, which 
is by using series rather than least upper bounds. Of course, doing so requires a 
rigorous treatment of series, which is found in Chapter 9, and a rigorous treatment 
of series ultimately relies upon the Least Upper Bound Property, so using series to 
study decimal expansions does not bypass the Least Upper Bound Property, it simply 
hides that property inside the study of series. There are no free results in mathematics, 
and in the present case we can either have longer proofs of the properties of decimal 
expansions while avoiding a preliminary treatment of series, or we can have shorter 
proofs of the properties of decimal expansions after having studied series rigorously. 
We chose the former approach in the present section, though it is a judgment call 
which method is preferable. Once the reader has learned some facts about series in 
Sections 9.2 and 9.3, it will be left to the reader in Exercise 9.3.8 and Exercise 9.3.9 
to provide simplified proofs of some parts of the present section. 


Exercises 


Exercise 2.8.1. Here is a magic trick that you can perform. First, photocopy and cut 
out the six cards shown Figure 2.8.1 (or make your own fancier versions of them). 
Then, ask a volunteer to pick a whole number from | to 60. Give the volunteer the 
six cards, and ask her to select those cards that have the chosen number on them 
(anywhere from one to five cards will have the chosen number). Take the selected 
cards, and say some appropriate magic words. While you do that, add up in your head 
the numbers in the upper left-hand corners of the selected cards, and that sum will 
be the chosen number, which you should announce to the audience with appropriate 
fanfare. (As an alternative, you could say that you are going to guess the volunteer’s 
age, and then ask the volunteer to select the cards that have her age on them; make 
sure the person you select is not over 60.) 
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The mathematical question is: explain why this trick works. Use a result from this 


section. 


13579 
11 13 15 17 19 
21 23 25 27 29 
31 33 35 37 39 
41 43 45 47 49 
51 53 55 57 59 


2 3 6 7 10 
11 14 15 18 19 
22 23 26 27 30 
31 34 35 38 39 
42 43 46 47 50 
5154 55 58 59 


45 67 12 
13 14 15 20 21 
22 23 28 29 30 
31 36 37 38 39 
44 45 46 47 52 
53 54 55 60 


8 9 10 11 12 
13 14 15 24 25 
26 27 28 29 30 
31 40 41 42 43 
44 45 46 47 56 
57 58 59 60 


16 17 18 19 20 
21 22 23 24 25 
26 27 28 29 30 
31 48 49 50 51 
52 53 54 55 56 
57 58 59 60 


32 33 34 35 36 
37 38 39 40 41 
42 43 44 45 46 
47 48 49 50 51 
52 53 54 55 56 
57 58 59 60 


Fig. 2.8.1. 


Exercise 2.8.2. [Used in Theorem 2.8.10.] Let a,b,g,r €¢ NU {0}. Suppose that b > 0, 
that a= bq+rand thatO<r<b. 


(1) Prove thatO<q< oo 
(2) Suppose that a = xy for some x,y € NU {0} such that x < b. Prove that 
O<q<y. 


Exercise 2.8.3. [Used in Lemma 2.8.5.] Prove Lemma 2.8.5 (1) (3). 


Exercise 2.8.4. [Used in Theorem 2.8.2, Lemma 2.8.3, Lemma 2.8.5 and Theorem 2.8.6.] 
Let p € N. Suppose that p > 1. Letk EN. 


(1) Let ao,ai,-..,ax-1 € {0,...,p—1}. Prove that Y] ap! < p*. 
(2) Let ao,a1,...,a¢-1 € {0,...,p — 1}, and let r,s € {0,...,k — 1}. Suppose that 
r<s. Prove that Y3_.aip"' < a = re and that equality holds if and only if 
a; = p—1 for alli € {0,...,k—1}. 
(3) Let bo,b1,...,b, € Z. Suppose that |b;| << p—1 for alli € {0,...,k}. Prove 
that if <9 bip' = 0, then b; = 0 for all i € {0,...,k}. 
[Use Exercise 2.5.3 and Exercise 2.5.12.] 


Exercise 2.8.5. [Used in Theorem 2.8.10.] Let p € N. Suppose that p > 1. Letbe R 
and a,d2,a3... € {0,...,p—1}. Letr EN, and let 


S= {o+ Yap |ueNandu> rh. 


i=r 
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Prove that S has a least upper bound, and that 


lubS = bub} Sap |ue¢ Nandu> rh. 


i=r 


In particular, when r = 1, it follows that 


lubS = b+) aip™. 
i=l 


[Use Exercise 2.6.9. ] 
Exercise 2.8.6. [Used in Theorem 2.8.6.] Prove Theorem 2.8.6 (2) (3). 


Exercise 2.8.7. [Used in Theorem 2.8.6.] Let p © N. Suppose that p > 1. Let 
a},a2,a3... € {0,...,p—1} and e),e2,e3... € {0,...,»—1}. Suppose that there 
is no m€ N such that a; = p—1 for all i € N such that i > m, or that e; = p—1 for all 
i € N such that i > m. Prove that if Y* , ap = yey ep ', then a; = e; for all i CE N. 


Exercise 2.8.8. [Used in Exercise 2.8.9.] In Theorem 2.8.8 we stated the Division 
Algorithm for a € NU {0} and b EN. In fact, the Division Algorithm works for any 
a,b € Z such that b ¥ 0. Prove the following statement. Let a,b € Z. Suppose that 
b #0. Then there are unique g,r € Z such that a = bq +r and 0 <r < |b|. (There is 
no need to prove again what was proved in Theorem 2.8.8; use what was proved in 
that theorem, and prove only what was not proved in the text.) 


Exercise 2.8.9. Let n € Z. The integer n is even if there is some k € Z such that 
n = 2k; the integer n is odd if there is some k € Z such that n = 2k +1. 
Prove that every integer is either even or odd, but not both. 
[Use Exercise 2.3.5 (2) and Exercise 2.8.8. ] 


2.9 Historical Remarks 


Please see Section 1.8. 


3 


Limits and Continuity 


3.1 Introduction 


Having considered the fundamental properties of the real numbers in Chapters | and 2, 
we now commence our study of real analysis proper. The heart of real analysis, and one 
of the key features that distinguishes real analysis from algebraic and combinatorial 
branches of mathematics, is the concept of a limit. There are various types of limits, 
for example limits of functions (which we will discuss in the present section), and 
limits of sequences (to be discussed in Section 8.2). However, all of these types of 
limits have similar features, and gaining familiarity with one type of limit will make 
learning about the other types much easier. The reader has already encountered limits 
of functions in an intuitive fashion in calculus courses. However, limits take on a 
much more important role in real analysis than in calculus because in the former we 
are concerned with rigorous proofs rather than applications, and limits are at the heart 
of the rigorous treatment of calculus. 

We start this chapter with a treatment of limits of functions in Section 3.2, to 
be followed by the closely related topic of continuity in Section 3.3. In Section 3.4 
we discuss the somewhat technical concept of uniform continuity, which is a very 
useful variant of the more familiar notion of continuity. The discussion of uniform 
continuity involves the first substantial proof of the chapter (not surprisingly, a proof 
that relies upon the Least Upper Bound Property). The concluding section of this 
chapter, Section 3.5, has further substantial proofs, first of the Extreme Value Theorem 
and the Intermediate Value Theorem, and finally of the fact that the Heine—Borel 
Theorem, the Extreme Value Theorem and the Intermediate Value Theorem are all 
logically equivalent to the Least Upper Bound Property. 


3.2 Limits of Functions 


To understand the need for limits, consider the familiar formula for the definition of 
derivative of a function f at the number c, which is 
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lim 
h-=0 


fle+h)-f() 
; 


The reader is familiar with this formula from calculus courses; we will discuss this 
formula from a rigorous point of view in detail in Chapter 4. For now, we want to 
highlight the use of the limit in the definition of derivatives. What this limit tells us 
is that we want to evaluate the fraction ferh)—Ke) for values of h that approach 0, 
but, and this is the important point to note, not when / = 0. In this particular case we 
could not evaluate the fraction at / = 0 even if we had wanted to; if we were to try to 
substitute / = 0 into this fraction we would have 0 in each of the numerator and the 
denominator, which does not yield a real number. 

In general, as is often stated informally in calculus courses, the intuitive idea of a 
limit of a function f as x goes to a number c in the domain of f is that the value of 
f (x) gets closer and closer to a number L as the value of x gets closer and closer to c. 
(As always, note the distinction between the name of the function, which is f, and 
the value of the function at x, which is f(x).) Not every function has a limit at each 
number c. It is important to stress that if we look at the values of f(x) as x gets closer 
and closer to c, we never consider the value of f at c; in fact, the function f need not 
be defined at c for the limit of f to exist as x goes to c. 

More formally, when we look at limits of functions, we will most often consider 
functions of the form f: J— {c} — R, where J C R is an open interval and c € J is a 
number. See Exercise 3.2.17 for a discussion of whether it is necessary to restrict our 
attention to J being an open interval, or whether more general sets would work; in 
practice, however, it usually suffices to look at open intervals, and we will therefore 
do so. 

In the informal discussion of limits in calculus courses, we often have the function 
f defined on the whole interval /, including at c, but to be able to deal with the most 
general situation involving limits, we consider functions defined on J — {c}. Of course, 
if a function is in fact defined on all of 7, then by abuse of notation we can also think 
of the same function as being defined on J — {c}, so there is no harm in having the 
function defined at c, but it is important that it is not required that the function be 
defined at c (for example to allow for the limit used in the definition of derivatives). 
And if the function is defined at c, then it is very important to note that the value of 
f(c) plays no role in the limit of f as x goes to c. 

The phrase “gets closer and closer,’ which is used in the informal approach 
to limits, is not at all usable in a rigorous definition, and we need to replace it by 
something more precise. In fact, not only is the phrase “getting closer and closer” a 
fuzzy description of what happens in limits, it is simply incorrect. Consider first the 
function h: R—{0} — R defined by h(x) = x* +3 for all x €¢ R— {0}. The graph of 
this function is seen in Figure 3.2.1 (i). To ask whether the limit of h exists at c = 0, 
we observe that indeed h(x) gets closer and closer to 3 as x gets closer and closer to 0, 
in exactly the sense that the phrase “gets closer and closer’ was intuitively intended. 
Whatever the rigorous definition of limits is, it is certainly the case that the limit of h 
as x goes to 0 ought to be 3. On the other hand, let g: IR — {0} — R be defined by 
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() <3, ite0 
2 2-3, ifx<0. 


The graph of g is seen in Figure 3.2.1 (ii). Whatever the rigorous definition of limits 
is, the graph of this function tells us that the limit of g as x goes to 0 ought not to exist. 
And yet, it is the case that as x gets closer and closer to 0, the values of g(x) also get 
closer and closer to 0. The point is that it is not sufficient for the values of g(x) to 
get closer and closer to a given number, they have to get arbitrarily close to the given 
number, and it is the measure of arbitrary closeness that is missing from the phrase 
“gets closer and closer.” 


(i) (ii) 


Fig. 3.2.1. 


To measure “arbitrary closeness,” we use an arbitrarily chosen positive number, 
often denoted with a symbol such as € or 6. Let us now rewrite the phrase “the value 
of f(x) gets closer and closer to a number L as the value of x gets closer and closer to 
c” using measures of closeness. We start with the first part, relating the closeness of 
f(x) to L, where we will use € to denote our measure of closeness. The idea of a limit 
existing is as follows: if for each possible choice of € > 0, no matter how small, we 
can show that for all x sufficiently close to c (though not equal to c), the value of f(x) 
will be within distance € of L, then we will say that the limit of f as x goes to c is L. 
We will use 6 to denote the measure of closeness of x to c. Then, if for every possible 
choice of € > 0, no matter how small, we can show that there is some 6 > 0 such that 
for all x within distance 6 of c (though not including c itself), the value of f(x) will 
be within distance € of L, we will say that the limit of f as x goes to c is L. To say that 
f(x) is within distance € of L is to say that | f(x) — L| < €, and to say that x is within 
distance 6 of c, but x is not equal to c, is to say that x € J— {c} and |x —c| < 6. We 
then see that the rigorous way to say “the value of f(x) gets closer and closer to a 
number L as the value of x gets closer and closer to c” is to say that for each € > 0, 
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there is some 6 > 0 such that for all x € J— {c} such that |x—c| < 6, it is the case 
that | f(x) — L| < €. As seen in Figure 3.2.2, the expression “for all x € J— {c} such 
that |x —c| < 6, it is the case that | f(x) — L| < €” can be viewed graphically by saying 
that f(x) is within a band of width 2¢ centered at L whenever x is in J — {c} and x is 
within a band of width 26 centered at c. 


The above considerations lead us to the following definition. 


Definition 3.2.1. Let J C R be an open interval, let c € J, let f: 1—{c} + R bea 
function and let L € R. The number L is the limit of f as x goes to c, written 


lim f(x) =L, 
if for each € > 0, there is some 6 > 0 such that x € J— {c} and |x—c| < 6 imply 
| f(x) —L| < €. If lim f(x) = L, we also say that f converges to L as x goes to c. If f 


converges to some real number as x goes to c, we say that lim f(x) exists. A 
x= 


Definition 3.2.1 must be used precisely as stated. First, it is very important that we 
use a symbol such as € to measure the closeness of f(x) to L, not a specific numerical 
value, no matter how small. A symbol such as € could be any possible positive number, 
and is not any one specific numerical value. If we find a “6” for only some specific 
numerical values of €, we will not have proved that the limit exists. 

Second, the order of the quantifiers in the definition of limits is absolutely crucial. 
The definition of the limit of a function can be written in logical symbols as 


(Ve > 0)(46 > 0)[(x €IT— {ce} A |x—c] < 6) > |f(x)-L| < €]. 


The order of the quantifiers cannot be changed. If we want to prove that lim f(x) = L, 
x—C 


the proof must start with choosing an arbitrary € > 0. Next, after possible argumen- 
tation, a value of 6 > 0 must be given, where 6 may depend upon €, c and f. We 
then choose an arbitrary x € J — {c} such that |x —c| < 6. (Observe that saying only 
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“|x —c| < 6” guarantees neither that x € J nor that x ~ c, and hence we need to say 
“x € I—{c} and |x—c| < 6” when we are describing x.) Finally, again after possible 
argumentation, we must deduce that | f(x) — L| < €. It is important that the arbitrary 
choices are indeed arbitrary. A typical proof that lim f(x) =L must therefore have 


the following form: 


Proof. Let € > 0. 
Ganmnennaon) 
Let 5 Sak 
taccinneniadon) 
See that x € J— {c} and |x—c| < 6. 


(argumentation) 


Therefore | f(x) —L| < €. 


Such proofs are often called “e—6 proofs.” Learning to construct correct €—6 
proofs may take some practice, but the effort is very worthwhile, because this type of 
argument will be used in many places in real analysis. 

We will see some examples of limits shortly, but first we need a very important 
lemma. Although it is not stated in Definition 3.2.1 that the number “L” in the 
definition is unique, it turns out that if lim f(x) =L for some L € R, then there is only 


one such number L. In other words, if a function has a limit as x goes to c, that means 
there is a single number L that f(x) is getting closer and closer to; if there is no such 
number, then there is no limit. 


Lemma 3.2.2. Let I C R be an open interval, let c € I and let f: I—{c} > Rbea 
function. If lim f(x) = L for some L € R, then L is unique. 
Se 


Proof. Suppose that lim f(x) = Ly and lim f(x) = Lz for some L;,L2 € R such that 
4—¢ Ps 8 


L; £Ly. Let € = a Then € > 0. By the definition of limits there is some 6, > 0 


such that x € J— {c} and |x—c| < 6; imply | f(x) —L;| < €, and there is some 6 > 0 
such that x € J — {c} and |x —c| < 5) imply | f(x) —L2| < €. Let 6 = min{6), 65}. 
Choose some x € J — {c} such that |x —c| < 6; such x exists by Exercise 2.3.8. Then 
|x —c| < 6; and |x—c| < dy, and hence 


[Li — Lal = [Li — F(x) + F() — Lal S |L — Fe) + F(x) — Lol 


134 3 Limits and Continuity 


|L1 —Ly| 


=|f(%) —Lil+|f(@) -—Lo] <e +e =2e=2 =|Li—Lyl, 


which is a contradiction. We deduce that if lim = L for some L € R, then L is 
Fi Ses 


unique. 


Because of Lemma 3.2.2 we can refer to “the” limit of a function at c, if the limit 
exists. 


Example 3.2.3. In the first three parts of this example, we will first do some scratch 
work prior to the actual proof. It is often necessary to do scratch work as a first 
step when working with the ¢—6 definition of limits, though it is important to avoid 
confusing the scratch work (which often involves working backwards from the desired 
conclusion) with the proof. 


(1) We will prove that lim (5x+ 1) = 21. (In principle, we should have stated that 


the function under consideration is f: R— {4} — R defined by f(x) =5x+ 1 for all 
x € R, but that is implicitly clear, and we will not write out the name of the function 
in other similar situations.) 


Scratch Work We work backwards for our scratch work. We want to conclude that 
|(5x+ 1) — 21] < €, which is the same as |5x — 20| < €, which is equivalent to 
5|x —4| < €, which in turn is the same as |x— 4| < £. We now see that 6 = = ought 
to work, though we will only be sure that it works when we try to write the proof up 
properly, which means “forwards.” 


Actual Proof Let € > 0. Let 6 = £. Suppose that x € R — {4} and |x—4| < 6. Then 
& 
\(Sx+ 1) ~21| =|5x—20| =5|r—4| <58=5-= =e. 
(2) We will prove that lim (x? -1) =8. 


Scratch Work Again, we work backwards for our scratch work. We want |(x? — 1) — 
8| < €, which is |x? —9| < €, which is |(x—3)(x+3)| < €, which is |x—3]| < Ea 


6699 


We cannot take 6 = BP because 6 must be a number, whereas “x”? would not have a 
fixed value at this point in the proof. The number 6 can depend upon € and 3 (which 
is “c” here), both of which come before 6 in the proper order of the quantifiers in the 
definition of limits, but “x” comes after 6 in the definition, and indeed the choice of x 
depends upon 6, so 6 cannot depend upon x. Fortunately, we can define 6 properly by 
using the following trick, which will give us a bound on the possible values of |x + 3]. 
Suppose that |x — 3] < 1. Then —1 < x—3 < 1, which implies that 2 < x < 4, and 
therefore 5 <x+3 <7, and hence 5 < |x+3| <7. We now choose 6 = min{5, 1}. 


Actual Proof Let ¢ >0. Let 6 =min{5,1}. Suppose that x € R— {3} and |x—3| <6. 
Then |x —3| < 1, which implies that —1 <x—3 < 1, and therefore 2 < x < 4, and 
hence 5 < x+3 <7, and we conclude that 5 < |x+3| <7. Then 


(x2 — 1) —8] = |x? —9] = [x3] - [x4 3)< 8-75 5-7=8. 
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(3) We will prove that a t does not exist. 


Scratch Work The graph of the function f: R— {0} — R defined by f(x) = ; for 
all x € R— {0} is seen in Figure 3.2.3 (i), and it is evident that as x goes to 0 from 
the right-hand side, the values of f(x) go to positive infinity, and that as x goes to 0 
from the left-hand side, the values of f(x) go to negative infinity. We now see two 


intuitive reasons why lim i does not exist. First, it is important to stress that when 
x0 ~ 
we write “lim f(x) = L,” we always mean that L is a real number. Although we can 
«—>c 


write the symbol “co” to represent “infinity,” the symbol © is not a real number. If 
one writes “lim f(x) = 0” (which we will not do in the present chapter, but which 
xc 


we will do in Chapter 6), then one is not actually writing a limit in the sense that we 

are discussing at present; the symbol “co” in this context means that the value of f(x) 

grows without bound. The other intuitive reason that oe t does not exist is that as x 
x=” 


goes to 0 from the right-hand side the values of f(x) do one thing, and as x goes to 
0 from the left-hand side the values of f(x) do a different thing. For a limit to exist, 
the values of f(x) must do the same thing no matter how x approaches c, as will be 
clarified by Lemma 3.2.17 below. 


y 
y 
(i) (ii) 
Fig. 3.2.3. 


A proof that a limit does not exist requires the €—6 definition just as much as a 
proof that a limit does exist. In the present case we will use proof by contradiction. 
More precisely, we suppose that lim : =L for some L € R. We will then find some 

x0 


€ > 0 for which there is no appropriate 6, which means that for each 6 > 0, there 
is some x € R— {0} such that |x —0| < 6 and yet |+ -L| 4 €. Let 6 > 0. We now 
work backwards, and notice that what we need is to have : —L<-eéor t —-L>€. 
It suffices to find x such that one of these inequalities holds. There are three cases, 
depending upon whether L > 0 or L= 0 or L < 0. Suppose that L > 0. We want 
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- LL > €, which is the same as | > L+ €, and so we need x < Te: This last step 
works with any choice of . because L > 0. We also need to have |x —0| < 6, and so 
we will choose x = min{ & 3) THE — }. We will not give the full details of the cases where 
L < Oand where L = 0, except to note that when L < 0 we need to choose some € > 0 


a 


so that L+ € < 0, for example we could choose € = +5", and so we might as well use 


that € when L > 0 as well; when L = 0 we cannot use € = ee because we need € > 0, 
but any choice of positive € will work, for example € = 1. 


Actual Proof Suppose that lim; =L for some L € R. Let € = a if LA 0, and 


let € = 1 if L = 0. We consider the case when L > 0; the other cases are similar, 
and the details are left to the reader. Let 6 > 0. Because L > 0, then L+€e > 0. Let 
x= a 8 ; re}: Then x € (0,c¢) and |x—0| < 8 < 6. On the other hand, because 


x< re , it follows that L+é€ < i, and hence 4 ; ~L= €, which implies that |+ —-LI <e. 


(4) We will prove that lim sin + does not exist. We have not yet defined the 


x0 


trigonometric functions rigorously fe will do so in Section 7.3), but we assume that 
the reader is informally familiar with sinx and its basic properties, and if we assume 
these properties, we can give the following proof. See Figure 3.2.3 (ii) for the graph 
of the function g: R— 0} — R defined by g(x) = sin + for all x € R— {0}. 


Suppose that be sin + - =Lforsome LER. Let € = an Then there is some 6 > 0 
such that x € J— {c} and eel, < 6 imply |f(x) — ra < 5 By Corollary 2.6.8 (2) 
there is some n € N ae that + = 7 262. Hence om < < 6. Let x} =2an+ 4 = and 

= 2mn+ 3B . Then 1 a 6 ae a 6. Therefore 


2 = |1—(-1)| =|sinx, — sinx2| = sin =~ — sin =- 
x] x2 
2 lps sl 1 < |i 1 . i 1 1 
= |sin = L+L sin =~] < |sin LI+ \£ sin <5 tah 
xX] x2 x] x2 
which is a contradiction. We conclude that lim sin + does not exist. © 
x—' eS 


In Example 3.2.3 (3) (4) we saw two limits that do not exist. We observe, however, 
that these two limits do not exist for quite different reasons. In Figure 3.2.3 (i) we see 
that the problem is that as x goes to 0, the function goes to infinity from one side and 
negative infinity from the other side, whereas in Part (ii) of the figure the function 
oscillates more and more rapidly with values between | and —1 as x gets closer to 0, 
and hence there is no single value toward which the values of the function get closer 
as x gets closer to 0. 

The €-—6 definition of limits given in Definition 3.2.1 is not always easy to use in 
practice, and in the rest of this section we will see some lemmas and theorems that 
allow us to compute limits more easily in certain situations (the proofs of these results 
make use of the €—d approach, but once these results have been proved, we can often 
apply them without further use of ¢—). We start with some preliminary results. 


3.2 Limits of Functions 137 


Our next result states that if a function has a positive limit at c, then the function 
must be positive near c, and similarly for a negative limit. 


Theorem 3.2.4 (Sign-Preserving Property for Limits). Let 1 C R be an open inter- 
val, letc € I and let f: I—{c} — R be a function. Suppose that lim f (x) exists. 
1. If lim f(x) > 0, then there is some M > 0 and some 6 > 0 such that x € I—{c} 
x->C¢ 


and |x —c| < 6 imply f(x) >M. 
2. If lim f(x) <0, then there is some N <0 and some 6 > 0 such that x € I—{c} 
x—-Cc 


and |x —c| < 6 imply f(x) <N. 
Proof. We will prove Part (1); the other part is similar, and we omit the details. 
(1) Suppose that lim f(x) > 0. Let L = lim f(x). Let M = f Then M > 0. By the 
x—-c x—C¢ 


definition of limits there is some 5 > 0 such that x € J — {c} and |x—c| < 6 imply 
| f(x) —L| < §. Then x € J— {c} and |x—c| < 6 imply that —5 < f(x) -L < §, and 
hence 5 < f(x), and therefore f(x) > M. 


Recall from Definition 1.7.7 (3) or Definition 2.2.2 (3) the concept of a subset of 
R being bounded. It was proved in Exercise 2.3.11 that a subset A C R is bounded if 
and only if there is some M € R such that |x| < M for all x € A. We now define what 
it means for functions to be bounded. Although this definition does not make use of 
limits, it will be useful in our study of limits, and hence we give it here. 

When discussing the boundedness of functions, we will see that the boundedness 
occurs entirely in the codomain, and hence our definition of bounded functions 
requires that the codomains of the functions under consideration are sets that are 
susceptible to the notion of boundedness, which for our purposes means that we need 
codomains that are subsets of the real numbers. The domain of a bounded function 
could be any type of set. 


Definition 3.2.5. Let A be a set, let B C R be a set and let f: A — B be a function. 
The function f is bounded if the set f(A) is bounded; that is, if there is some M € R 
such that | f(x)| <M for all x € A. The number M is called a bound of f. A 


Example 3.2.6. Let 1: [0,1] — R be defined by 


h(x) 1, ifx=0 
|X) = 
1, ifxe€ (0,1). 


It is intuitively evident that h is not bounded by looking at its graph, but we need to 
provide a proof, which we do by showing that for each M € R, there is some x € [0, 1] 
such that M < |f(x)|. Let M € R. There are now two cases. First, suppose that M < 0. 
Then M < 1 = |h(0)|. Second, suppose that M > 0. Let x = M+ 1. Then x > 1, and 
hence + € (0, 1]. We then see that |h(+)| =x > M. Hence h is not bounded. © 


Observe that if a function is bounded, the bound of the function is not unique, 
because any number larger than a bound is also a bound. Clearly any bound of a 
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function is non-negative, and it is always possible to choose a positive bound, which 
we will do when convenient. 

We now have two lemmas that involve limits and boundedness. Clearly a function 
can have a limit at some number c even if the function is not bounded, for example the 
function in Example 3.2.3 (1), and a function can be bounded but not have a limit at 
some number c. There are, however, some relations between limits and boundedness, 
as we now see. 


Lemma 3.2.7. Let I C R be an open interval, let c € I and let f: I—{c} > Rbea 
function. If lim f(x) exists, then there is some 6 > 0 such that the restriction of f to 
(I= {c}) N(c—6,c+ 5) is bounded. 


Proof. Let L = lim f(x). Then there is some 6 > 0 such that x € J—{c} and |x—c| < 


6 imply | f(x) —L] < 1. Suppose that x € J — {c} and |x—c| < 6. By Lemma 2.3.9 (7) 
we see that | f(x)| — |L| < 1, and hence |f(x)| < |Z|+ 1. Therefore the restriction of f 
to I — {c}) N(c—6,c+54) is bounded, with bound |Z| + 1. 


Our next lemma is a useful result about limits that makes use of boundedness. The 
reader is asked in Exercise 3.2.4 to show that this hypothesis of boundedness in this 
lemma cannot be dropped. 


Lemma 3.2.8. Let I CR be an open interval, let c € I and let f,g: I— {c} + R be 
functions. Suppose that lim f(x) = 0, and that g is bounded. Then lim f (x)g(x) = 0. 
xc “Cc 


Proof. Let € > 0. Because g is bounded, there is some M € R such that |g(x)| <M for 
all x €—{c}; we may assume that M > 0. Because lim f(x) =0, there is some 6 > 0 
such that x € J — {c} and |x—c| < 6 imply | f(x) —0| < 77. Suppose that x € J — {c} 
and |x—c| < 6. Then 


[F(x)g(x) — 9] = [F@)@)| = IF@)|-Is@)| < — ‘M=e. 


One of the most convenient situations in which to prove that a function has a 
limit is when the function is built up out of simpler functions, the limits of which 
we are more easily able to evaluate. The following definition, which will be useful 
throughout our study of real analysis, states how to add, subtract, multiply and divide 
functions. As was the case when we defined bounded functions in Definition 3.2.5, 
the addition, subtraction, multiplication and division of functions occurs entirely in 
the codomain, and hence the definition of addition, etc., of functions requires that the 
codomains of the functions under consideration are sets where we can perform such 
operations, which for our purposes means that we need codomains that are the real 
numbers. The domains of such functions could be any type of set. 


Definition 3.2.9. Let A,B be sets, let f: A — R and g: B — R be functions and let 
KER. 


1. The function f + g: ANB — R is defined by [f + g](x) = f(x) + g(x) for all 
xEANB. 
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2. The function f —g: ANB — Ris defined by [f — g](x) = f(x) — g(x) for all 
xEANB. 

3. The function kf: A — R is defined by [kf](x) =kf(x) for all x € A. 

4. The function fg: ANB — R is defined by [fg](x) = f(x) - g(x) for all x € 
ANB. 

5. Let C= (ANB) —{b € B | g(b) = 0}. The function £ : C — Ris defined by 


[4] (x) = i for allx EC. 


6. The function |f|: A — R is defined by |f|(x) = | f(x)| for all x € A. A 

We now see that limits behave nicely with respect to the addition, subtraction, 
multiplication and division of functions. The relation of limits to the absolute value of 
a function is discussed in Exercise 3.2.9. 


Theorem 3.2.10. Let I C R be an open interval, let c € I, let f,g: I— {c} > R be 
functions and let k € R. Suppose that lim f (x) and lim g(x) exist. 
x—-C Poe 


: lim [f + g]( ) exists and lim | f + g](x ) = lim f(x ) + lim g(x). 


5 lim Lf — g|(x) exists and lim — gl (x ) = lim f(x ) —lim g(x) 
exists and lim f(x )= Kim f(x ) 


exists and lim [fg] (x) = lim f(x )]- [lim g(x)). 


5. If lim g(x) #0, then lim [4] (x) exists and lim [4] (x) = tae 


x—->C 


R&S bH Mm 
LB Le! 
— 
> 
ret 
be 


- lim ) 
. lim [fg](x) 


Proof. We will prove Parts (1), (3) and (4), leaving the rest to the reader in Exer- 
cise 3.2.6. 
Let L = lim f(x) and M = lim g(x). 
x—-Cc xc 


(1) Let € > 0. Then there is some 6; > 0 such that x € J— {c} and |x—c| < 6) 
imply | f(x) —L| < 5, and there is some 62 > 0 such that x € J — {c} and |x—c| < & 
imply |g(x) —M| < §. Let 6 = min{ 6, d)}. Suppose that x € J— {c} and |x—c| <6. 
Then 


[fF + g](x) —(L+M)| = |(f(x) -L) + (g(%) —M)| SF) -L] + |s(x) -M| 


(3) By Exercise 3.2.3 we know that lim [f(x) —L] = 0. Let h: I— {c} — R be 
defined by h(x) =k for all x € 1— {c}. Then h is bounded, with bound |k|. Hence, Lem- 
ma 3.2.8 implies that lim k[,f (x) — L] =0, which then implies that lim [kf (x) —kL] =0. 

x xc 
Using Exercise 3.2.3 again we deduce that limk f(x) = kL. 
Pe 68 


(4) Let € > 0. By Lemma 3.2.7 there is some 6; > 0 such that the restriction 
of g to (I— {c}) N(c — 6},c + 61) is bounded. Hence there is some B € R such 
that |g(x)| < B for all x € (I— {c}) N(c— 61,c+ 62). We may assume that B > 0. 
Then B+ |L| > 0. There is some 6) > 0 such that x € J — {c} and |x—c| < 6) imply 
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| f(x) -L| < BH and there is some 63 > 0 such that x € J— {c} and |x —c| < 65 
imply |g(x) —M| < Let 6 = min{ 6), 62,63}. Suppose that x € J— {c} and 
|x—e¢| <6. Then 


|[ Fs] (x) —LM| = |f(x)g(x) — LM] = |f(x)a(x) — gL + g(x)L— LM | 
S |g(x)|- IF @ peer sea 


E 
<B +|L]- =€. 
B+ |L| IE: Pear 


Example 3.2.11. Combining Theorem 3.2.10 (3) (4), Example 3.2.3 (2) and Exer- 
cise 3.2.1, we see that 


lim (12x7 + 8x7 — 12x—8) = lim [4(x? — 1)(3x+2)] 
eS x 


=a 


Ly" 


=4. [lim (x° — 1)]- [lim (3x + 2)] =4-8-11=352. 
x—- P oa 
The following theorem concerns the relation between limits and the composition 
of functions. 


Theorem 3.2.12. Let I,J C R be open intervals, let c € I, let d € J and let 
g:1—{c}— J—{d} and f: J—{d} > R be functions. Suppose that lim g(y) = d 
ye 


and that lim f(x) exist. Then lim (f og)(y) exists, and lim (f 0 g)(y) = lim f(x). 
x yc ye x 


Proof. Left to the reader in Exercise 3.2.10. 


We now turn to the relation of limits to inequalities between functions. 


Theorem 3.2.13. Let I C R be an open interval, let c € I and let f,g: I1—{c} +R 
be functions. Suppose that f(x) < g(x) for all x € I— {c}. If lim f(x) and lim g(x) 
x—C ate 
exist, then lim f(x) < lim g(x). 
«xc «“—>c 


Proof. Left to the reader in Exercise 3.2.11. 


Our next result provides a convenient way to prove the existence of the limit of a 
tricky function by “trapping it” between two functions whose limits can be evaluated 
more easily. The reader may be familiar with the version of this theorem that holds 
for sequences (which we will see in Theorem 8.2.12); indeed, many of the basic facts 
about limits of functions have analogs for limits of sequences, as the reader will see 
in Section 8.2. 


Theorem 3.2.14 (Squeeze Theorem for Functions). Let J C R be an open interval, 

letc € 1 and let f,g,h: I—{c} — R be functions. Suppose that f(x) < g(x) < h(x) 

for all x € I—{c}. If lim f(x) = L = limA(x) for some L € R, then lim g(x) exists 
xc XC x->C 


and lim g(x) = L. 


Proof. Suppose that lim f(x J=L= = limh(x ) for some L € R. Let € > 0. There is 
some 6; > 0 such flere El— and ewe < 6; imply |f(x) —L] < é€, and there 
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is some 6) > 0 such that x € J— {c} and |x—c| < 6) imply |h(x) —L| < €. Let 
6 = min{6), 62}. Suppose that x € J — {c} and |x—c| < 6. Then |f(x) —L| < € and 
|h(x) —L| < €. It follows that L—e < f(x) <L+e and L—e <h(x) <L+e. Hence 


L—€< f(x) < g(x) <h(x) <L+e, 


which implies |g(x) — L| < . 


We conclude this section with a brief discussion of “one-sided” limits, which 
are limits at a number c that involve approaching c from only one side, either from 
the right (that is, via numbers larger than c) or from the left (that is, via numbers 
smaller than c). Such one-sided limits are useful at the endpoints of closed intervals. 
For example, consider the function g: IR — R defined at the start of this section, the 
graph of which is seen in Figure 3.2.1 (ii). Whereas the limit of this function does not 
exist as x goes to 0, if we restrict our attention to x > 0, then the limit of that part of 
the function as x goes to 0 ought to be 3; that is a right-hand limit. Similarly, if we 
restrict our attention to x < 0, then the limit of that part of the function as x goes to 0 
ought to be —3; that is a left-hand limit. 

Observe that the interval J in the following definition is not necessarily open. 


Definition 3.2.15. Let J C R be an interval, let c € I, let f: I— {c} — R bea function 
and let L ER. 


1. Suppose that c is not a right endpoint of 7. The number L is the right-hand 
limit of f at c, written 


lim f(x) = 


x—ct 


if for each € > 0, there is some 6 > 0 such thatx € J—{c} andc<x<c+6 
imply | f(x) —L| < e. If = f(x ) = L, we also say that f converges to L as 


x goes to c from the right. it f converges to some real number as x goes to c 
from the right, we say that lim, f(x) exists. 


xX 


2. Suppose that c is not a left endpoitit of J. The number L is the left-hand limit 
of f at c, written 
lim f(x) = 
4c 
if for each € > 0, there is some 6 > 0 such thatx € 1—{c} andc—d <x<c 
imply | f(x) —L| < e. If ae 1 f(x) =L, we also say that f converges to L as 


x goes to c from the left. If f converges to some real number as x goes to c 
from the left, we say that Bais f(x) exists. 


3. A one-sided limit is either 2 a right- -hand limit or a left-hand limit. A 
| 


el and jim Bl 


Example 3.2.16. We examine each of au Tae and lim Ml The function 


under consideration here is f: R— 0}. — R defined Gy f(x) = al for all x € R— {0}. 
The key observation is that 
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|x| sd, ifx>0 
x )-1, ifx<0. 


It then follows that lim lat = lim | = 1, and that lim Te = lim —1=-—1. On the 
x30t * 0+ x—0- * x—0- 


other hand, we see that lim al does not exist, as follows. Suppose to the contrary that 
x! 


Tern al = L for some L € R. It cannot be the case that L equals both | and —1. Without 
x—U ~ 


loss of generality, assume that L 4 1. Let € = La Then € > 0. For any 6 > 0, we 
can choose y € (0,6), and then f(y) = 1, which implies | f(y) —L| = |1-—L] = 2e€ > €. 
Hence we cannot find the required 6 for the given €, which is a contradiction to the 
i — 1, It follows that lim fl 


assumption that lim | does not exist. © 


Example 3.2.16 shows that ave, f(x) and _ f(x) can both exist and be different. 
We now see what happens when! both of these ¢ one- -sided limits exist and are equal. 


Lemma 3.2.17. Let I C R be an open interval, let c € I and let f: I— {c} > R be 
a function. Then lim f (x) exists if and only if lim f(x) and lim f(x) exist and are 
Me xc x—c7 


equal, and if these three limits exist then they are equal. 


Proof. Suppose that lim a f(x ) exists. Let L = Tim f(x ). Let € > 0. Then there is some 


6 > 0 such that x € Te {c} and |x—c| < B imply | f(x) — L| < €. Suppose that 

x €I—{c}andc<x<c+6. Then |x—c| < 6, and it follows that | f(x) —L| < €. 

We deduce that lim f(x (x) = L. A similar argument shows that lim f(x) = L, and we 
xc 


x—ct 


omit the details. 
Now suppose that on 1 f(x (x) and iim 1 f (x) exist and are equal. Let M = lim f(x (x) 


x—ct 


= lim f(x). Let ¢>0. rinse abi ai > 0 such that x € 7— {c} andc<x<c+6) 


imply | f(x) —M| < €, and there is some 6) > 0 such that x € J—{c} andc—6) <x<c 
imply | f(x) —M| < e€. Let 6 = min{6), 62}. Suppose that x € J — {c} and |x—c| <6. 
Then c— 6 <x<c+6 andx¥c, and hence c— & <x <corc<x<c+6. In 
either case we deduce that | f(x) — M| < e. It follows that lim f(x) =M. 


It can be verified that the analogs for one-sided limits of the lemmas and theorems 
of this section all hold; the proofs are not substantially different in the one-sided case, 
and we omit the details. 


Reflections 


The reader has now had her first taste of what real analysis is all about. The e-6 
definition of limits of functions is at the very heart of any real analysis course, for two 
reasons. First, limits are the conceptual basis for continuity, derivatives, and more, 
and hence an understanding of limits is crucial for an understanding of much of real 
analysis. It could be said that it is the limit concept that distinguishes analysis from 
algebra. Second, the €—6 definition of limits of functions is the model for a number of 
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other similar definitions, for example the definition of limits of sequences, and even 
the definition of the Riemann integral of a function, which is a more complicated 
definition but still involves ¢ and 6 in analogous roles. Hence, mastering ¢—6 proofs 
will serve the student well as preparation for the rest of the material in this text. 

Some students find ¢—6 proofs a bit confusing upon first encounter, especially in 
comparison with the material in other junior—senior-level mathematics courses such 
as abstract algebra. In the author’s view, the problem stems from the prominent role 
of the quantifiers in ¢—6 proofs. More generally, in the author’s experience teaching a 
variety of proofs-based undergraduate mathematics courses, problems with quantifiers, 
whether due to misunderstanding or carelessness, are the source of the majority of 
errors that students have in the construction of rigorous proofs. In the particular case 
of the €—6 definition of limits of functions, it is the existence of two quantifiers—in 
a particular order—that makes the difficulty in mastering the quantifiers that much 
greater. Fortunately, the author’s experience has also led him to see that if students put 
in sufficient effort and care practicing €—-6 proofs, they can usually learn to formulate 
and write such proofs very nicely. 


Exercises 


Exercise 3.2.1. [Used throughout.] Let m,b,c € R. Using only the definition of limits, 
prove that lim (mx +b) =mc+b. 


x—->Cc 


Exercise 3.2.2. Using only the definition of limits, prove that each of the following 
limits holds. 


(1) lim (x? +3x+5) =9. 
x= 


: 2 
(2) lim ==2 = 6. 


Exercise 3.2.3. [Used in Theorem 3.2.10.] Let J C R be an open interval, let c € J, let 
f: 1—{c}— R be a function and let L € R. Using only the definition of limits, prove 
that lim f(x) = Lif and only if lim [f(x) — L] =0. 

x—-Cc Boer 


Exercise 3.2.4. [Used in Section 3.2.] Find an example of functions f,g: R—{0}— 
R such that lim f(x) = 0, and that lim [fg] (x) does not exist. 
x x— 


Exercise 3.2.5. [Used throughout.] Let J C J C R be open intervals, let c € J and 

let f: 1—{c} — R be a function. Prove that lim f(x) exists if and only if lim f|7(x) 
xc xc 

exists, and if these limits exist, then they are equal. 

Exercise 3.2.6. [Used in Theorem 3.2.10.] Prove Theorem 3.2.10 (2) (5). 


Exercise 3.2.7. [Used in Example 4.2.5.] Let 7 C R be an open interval, let c € J and 
let f,g: I—{c} — R be functions. Suppose that lim f(x) exists and that lim g(x) does 
x-Cc x—-Cc 


not exist. 


(1) Prove that lim [f + g](x) does not exist. 
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(2) Prove that if lim f(x) 4 0, then lim [fg](x) does not exist. 
os x—->C 


Exercise 3.2.8. [Used in Example 4.6.1.] Let 7 C R be an open interval, let c € J and 

let f: I— {c} — R be a function. Suppose that f(x) 4 0 for all x € J— {c}, and that 

lim f (x) = 0. Prove that lim Fs) does not exist. [Use Exercise 3.2.1.] 
xc JY 


x—-c 


Exercise 3.2.9. [Used in Section 3.2, Exercise 4.2.4 and Theorem 6.3.3.] Let J C R be 
an open interval, let c € J and let f: J— {c} — R be a function. 


(1) Let LE R. Prove that if lim f(x) = L, then lim|f(x)| = |Z]. 
xe x=. 
(2) Prove that lim f(x) = 0 if and only if lim |f(x)| =0. 
Pod 8 x—-C 
(3) Find an example of a function g: R — R such that lim |g(x)| = 1, but that 
x—-c 


lim g(x) does not exist. 
xe 


Exercise 3.2.10. [Used in Theorem 3.2.12.] Prove Theorem 3.2.12. 
Exercise 3.2.11. [Used in Theorem 3.2.13.] Prove Theorem 3.2.13. 
Exercise 3.2.12. [Used in Lemma 4.4.1 and Theorem 4.5.2.] 
(1) Let J C R be an open interval, let c € J and let f: J— {c} — R be a function. 
Suppose that lim f(x) exists. Prove that if f(x) > 0 for all x € J— {c}, then 
x—-Cc 
lim f(x) > 0. 
x—c 
(2) If we were to assume that f(x) > 0 for all x € J— {c} as the hypothesis of 
Part (1) of this exercise, would it be possible to conclude that lim f(x) > 0? 
Give a proof or a counterexample. 
Exercise 3.2.13. Let J C R be an open interval, let c € J and let f,g: 1—{c} —~R 
be functions. Suppose that lim f(x) and lim g(x) exist, and that for each € > 0, there 
x—Cc xc 
is some 6 > 0 such that x € J— {c} and |x—c| < 6 imply | f(x) — g(x)| < €. Prove 
that lim f(x) = lim g(x). 
x—Cc x—-Cc 


Exercise 3.2.14. [Used in Theorem 5.8.5.] Let A C R be a set, and let f,g,h: A— R 
be functions. Suppose that f and h are bounded, and that f(x) < g(x) < h(x) for all 
x € A. Prove that g is bounded. 


Exercise 3.2.15. [Used in Exercise 5.4.15.] Let A C R be a set, let f: A— Rbea 
function and let k € R. Suppose that f is bounded. 


(1) Prove that kf is bounded. 
(2) Prove that if k > 0, then 


lub{[kf](x) |x € A} =k: lub{ f(x) |x eA}, 


and 
glb{[kf](x) |x €A} =k- glb{ f(x) |x € A}. 
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(3) Prove that if k < 0, then 


lub{[kfl(x) |x A} =k. glb{ f(x) |x € A}, 


and 
glb{ [kf] (x) |x € A} =k-lub{ f(x) | x € A}. 


Exercise 3.2.16. [Used in Exercise 5.4.16.] Let A C R be a non-empty set, and let 
f,g: A — R be functions. Suppose that f and g are bounded. 


(1) Prove that f + g is bounded. 
(2) Prove that 


lub{[f + g](x) |x eA} <lub{ f(x) | xe A}+lub{g(x) | xe A}. 


(3) Find an example where the inequality in Part (2) of this exercise is strict. 
(4) Prove that 


glb{[f + g](x) |x € A} = glb{ f(x) | xe A} + glb{g(x) |x € A}. 


(5) Find an example where the inequality in Part (4) of this exercise is strict. 


Exercise 3.2.17. [Used in Section 3.2 and Section 3.3.] In the definition of limits in 
Definition 3.2.1, we looked at limits of the form lim f(x) for functions f: J—{c}—R, 


where / is an open interval. The purpose of this exercise is to discuss whether it would 
be plausible to define limits for functions with other types of domains; that is, for 
functions f: I— {c} — R where / is not necessarily an interval. 


(1) Let J = [1,2] U {0}, and let f: J— {0} — R be defined by f(x) =? for all 


(2 


(3 


(4 


) 


wm 


~a 


x €1—{0}. If Definition 3.2.1 is used as stated with this function f and 
with c = 0, prove that lim f(x) =r for every r € R. We conclude that Defini- 
X=+¢ 


tion 3.2.1 does not work with arbitrary sets J. 

Let A C R be a non-empty set, and let c € R. The number c is an accumulation 
point of A if for every 6 > 0 there is some x € A — {c} such that |x—c] < 6. 
Let J CR be an open interval, and let c € J. Prove that c is an accumulation 
point of J — {c}. 

Let A C R be a non-empty set, let c € R and let f: A— {c} > R be a function. 
(Note that it is not necessarily the case that c € A.) Suppose that c is an 
accumulation point of A. Prove that if Definition 3.2.1 is used as stated with 
this function f and this number c, and that if lim f(x) =L for some LE R, 


then L is unique. It would therefore be possible to rewrite Definition 3.2.1 
with the more general hypotheses that / is an arbitrary non-empty set, and 
with c an accumulation point of J. 

Let B,D CR be non-empty sets, let c € R and let f: BUD—{c}— Rbea 
function. Suppose that c is an accumulation point of each of B and D. Then 
c is an accumulation point of BU D. Prove that lim f(x) exists if and only if 


lim f|g—1¢} (x) and lim f'|p_ ¢<} (x) exist and are equal, and if these three limits 
x—C x=¢ 


exist then they are equal. (Observe that Lemma 3.2.17 is a special case of this 
more general result.) 
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Exercise 3.2.18. [Used in Exercise 3.4.9 and Section 8.3.] The definition of the limit 
of a function (Definition 3.2.1) is a very important and very useful definition, but it 
has one drawback, which is that in order to prove that a function has a limit, one first 
needs to guess the value of the limit. It would be nice to be able to prove that a limit 
exists without having to make such a guess. The purpose of this exercise is to prove a 
result that states, intuitively, that lim f (x) exists if and only if f(x) and f(y) get closer 


and closer to each other as x and y get closer and closer to c. (This characterization is 
more interesting in theory than it is useful in practice, though there is a much more 
well-known and widely used analogous characterization of the limits of sequences, 
called the Cauchy Completeness Theorem (Corollary 8.3.16).) This characterization 
of limits is as follows. 

Let J C R be an open interval, let c € J and let f: [— {c} — R be a function. 
Then lim f(x) exists if and only if for each € > 0, there is some 6 > 0 such that 


x,y €1—{c} and |x —c| < 6 and |y—c| < 6 imply |f(x) — f(y)| <e. 


We will prove this result in a few steps. 


(1) Suppose that lim f(x) exists. Prove that for each € > 0, there is some 6 > 0 
such that x,y € J— {c} and |x—c| < 6 and |y—c| < 6 imply | f(x) — f(y)| <e. 
(2) For the rest of this exercise, suppose that for each € > 0, there is some 6 > 0 
such that x,y € J— {c} and |x—c| < 6 and |y—c| < 6 imply | f(x) — f(y)| <e. 
For each r > 0, let Ay = (I— {c}) N(c—r,c +r). Hence, for each € > 0, there 
is some 6 > 0 such that x,y € Ag implies | f(x) — f(y)| < €. Prove that there 
is some 7) > 0 such that f(A,) is bounded. 
For each s € (0,7), we note that f(A;) C f(A7), and hence f(A;) is bounded. 
We can therefore define as; = glb f(As) and b, = lub f(A;). Let A = {as | 5 € 
(0,7)} and B = {by | s € (0,7)}. By Exercise 2.6.11 we see that A has a least 
upper bound and B has a greatest lower bound, and that lubA < glbB. Prove 
that lubA = glb B. Use the No Gap Lemma (Lemma 2.6.6). 
(4) Let M =lubA = glbB. Prove that lim f(x) =M. 


(3 


wm 


3.3 Continuity 


The idea of a continuous function is quite familiar intuitively, and is often described 
as a function whose graph y = f(x) can “be drawn without lifting the pencil from 
the paper.” That is, a function is continuous if its graph has no “gaps” or “jumps.” 
The graph seen in Figure 3.3.1 (i) represents a continuous function. The graph seen 
in Figure 3.3.1 (ii) represents a discontinuous function; this function has only one 
place at which it is not continuous, namely, at x = 0, but that is sufficient for the 
whole function to be considered discontinuous. Many of the familiar functions treated 
in calculus, such as polynomials, e*, Inx, sinx and cosx, are continuous. On the 
other hand, we cannot ignore discontinuous functions, because some applications 
of mathematics in the sciences and engineering require the use of discontinuous 
functions (for example, the description of an electric circuit that has an open switch 
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up till a certain point in time, at which point the switch is closed). Also, although 
discontinuous functions are not differentiable, as we will see by Theorem 4.2.4, it 
will turn out that some (though not all) discontinuous functions are integrable, as we 
will see in Example 5.2.6 (2) (3) (4), and in more generality in Theorem 5.8.5. 


(i) (ii) 


Fig. 3.3.1. 


A rigorous treatment of continuity involves the same type of €—6 arguments that 
are used in the rigorous treatment of the limits of functions. Indeed, as we will see 
in Lemma 3.3.2, there is a very close relationship between limits and continuity. 
There is, however, one fundamental difference between the definitions of limits and 
continuity. Suppose that we have an open interval J C R, a number c € / and a function 
f:1—R. If we want to find whether the limit of f as x goes to c exists, we do not 
take into account the value of f(c). Indeed, the function f need not be defined at c 
for this limit to exist. To study the continuity of f at c, by contrast, we are very much 
concerned with the value of f(c). To say that f is continuous at c, we need to know 
that, intuitively, the function “does not jump” at c. More precisely, we need to know 
that the value of f(c) is just what we would expect it to be if we looked at the values 
of f(x) as x goes to c. In other words, to say that f is continuous at c, we take the e-6 
of limits given in Definition 3.2.1, and we replace “L” with “f(c).” 


Definition 3.3.1. Let A C R be a set, and let f: A — R be a function. 


1. Let c € A. The function f is continuous at c if for each € > 0, there is some 
6 > 0 such that x € A and |x—c| < 6 imply | f(x) — f(c)| < €. The function f 
is discontinuous at c if f is not continuous at c; in that case we also say that 
f has a discontinuity at c. 

2. The function f is continuous if it is continuous at every number in A. The 
function f is discontinuous if it is not continuous. A 
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The reader will have noticed that in the definition of limits we restrict attention 
to functions with domains that are open intervals with a number removed, whereas 
in the definition of continuous functions we allow domains that are arbitrary subsets 
of R. If we are taking the limit at a number c € R, we need to be sure that there are 
values of x in the domain of the function that really do get “closer and closer” to c; 
having c be in an open interval guarantees just that (see Exercise 3.2.17 for further 
discussion of this issue). By contrast, to have the most general possible definition of 
continuous functions, we allow domains that are arbitrary. It could happen that the 
domain of a function has an “isolated” point, in which case the function is always 
continuous at that point, no matter what the value of the function is at that point; see 
Exercise 3.3.9 for details. Such behavior at an isolated point might seem somewhat 
strange, but it does not cause any problems, and it allows for the greatest generality. 

When we restrict attention to functions with domains that are open intervals, 
then we see in the following lemma how closely related the concepts of limits and 
continuity are. This lemma follows immediately from the definition of limits and 
continuity, and we omit the proof. 


Lemma 3.3.2. Let 1 C R be an open interval, let c € I and let f : I — R be a function. 
Then f is continuous at c if and only if lim f (x) exists and lim f(x) = f(c). 


We now have some examples of continuous and discontinuous functions. 


Example 3.3.3. 


(1) Let A CR be a set, and let f: A — R be defined by f(x) = mx +b for all 
x € A, where m,b € R. We will prove that f is continuous. It would be possible to 
do this proof using an €—6 argument, but we can avoid such an argument by using 
Exercise 3.2.1. First, suppose that A = R. Then A is an open interval, and it follows 
from Exercise 3.2.1 and Lemma 3.3.2 that f is continuous. Second, if A is an arbitrary 
subset of R, we use the previous case together with Exercise 3.3.2 (2) to deduce that 
f is continuous. 

(2) Let B C R— {0} be a set, and let p: B — R be defined by p(x) = + for all 
x € B. We will prove that p is continuous. As was the case with some of our proofs 
involving limits, we will first do scratch work prior to the actual proof. 


Scratch Work We work backwards for our scratch work. Let c € B. There are two 
cases, depending upon whether c > 0 or c < 0. We will consider the former case; the 
latter case is very similar, and the details are left to the reader. We want | i _— 1| <&; 
which is | —| < €, which is |x—c| < €|x|c. The appearance of “‘x” in the right-hand 
side of this last inequality is a problem, because “6” cannot depend upon x. To remedy 
this situation, we want to impose a positive lower bound on the values of |x|, and 
the only way to do that is by our choice of 6. One way to obtain this lower bound 
is as follows. Suppose that |x —c| < §. Then —$ <x—c< §,s05 <x< xc, and 


: wat 2 
hence § < |x| < *¢. When this restriction on x holds, then €|x|c > £-. We can then 
. nd 
use 6 = min{§, &-}. 
Actual Proof Let c € B. We will prove that f is continuous at c. There are two 


cases. First, suppose that c > 0. Let € > 0. Let 6 = min{5, a Suppose that x € B 
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and |x —c| < 6. Then |x—c| < §, and hence —$ <x—c < §, which implies that 
soe 3 and therefore 5 < |x|. It then follows that 


ect 


cz) =e = 


=€E. 


Xx c XC 


|x|c ze 


We conclude that p is continuous at c. Second, suppose that c < 0. This case is similar 
to the previous case, and we omit the details. Because c was chosen arbitrarily, we 
conclude that p is continuous. 


(3) We assume that the reader is informally familiar with the standard elementary 
functions (that is, polynomials, power functions, logarithms, exponentials and trigono- 
metric functions). All of these functions are continuous. We have not yet defined 
these functions rigorously, and so we cannot prove that they are continuous at this 
point, but for now we will assume the continuity (and other standard properties) of 
these functions for the sake of examples. (We will treat these functions rigorously in 
Chapter 7, and in that chapter it will be seen that these functions are indeed continuous, 
which will be proved by showing that they are differentiable, and then using the fact 
that differentiable functions are continuous, as seen in Theorem 4.2.4.) 

While we are discussing the continuity of the familiar elementary functions from 
an informal point of view, here is a very interesting function that uses the sine function 
in its definition. Let k: R — R be defined by 

bo) = 15 fx #0 
0, ifx=0. 


The graph of k is seen in Figure 3.2.3 (ii). We saw in Example 3.2.3 (4) that lim k(x) 


does not exist. It follows from Lemma 3.3.2 that k is discontinuous at 0. (The function 
k is continuous at all other real numbers, but we omit the details.) 

(4) Is tanx continuous? A look at the graph of y = tanx might lead one to think 
that tanx is not continuous, because it has vertical asymptotes at x = 4 +n7 for 
all n € Z. However, it turns out that tanx is continuous. The issue is the domain 
of tanx. The definition of a function formally includes its domain and codomain, 
and simply writing out a formula for the function (such as “f(x) = tanx’’) does not 
rigorously define a function. In the case of some familiar functions, such as sinx 
and e*, we take it as known that the domain and codomain are both R. Observe, 
however, that tan.x is not defined at x = 5 +nz for all n € Z. The correct domain of 
tanx is ---U(—%, 3) U(%, 3%) U---, and on this domain, the function tanx is indeed 
continuous. A proof of this fact would require first proving rigorously that sinx and 
cosx are continuous on R, and then using the fact that tanx = cok for all x in the 
domain of tanx, combined with Theorem 3.3.5 (5), which will be proved later in this 
section. 


(5) Let g: R— R be defined by 


A ifx 40 
0, ifx—O0. 
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The function g is discontinuous at 0, though it is continuous everywhere else. To 
verify this fact, we observe that 


1, ifx>0 
g(x)= <0, ifx=0 
-~1, ifx<0. 


To see that g is discontinuous at 0, it would be possible to use the ¢—6 definition of 

continuity directly, but we can save some effort by observing that in Example 3.2.16 

we saw that im g(x) does not exist, and then applying Lemma 3.3.2. It is intuitively 
— 


clear that g is continuous at all numbers in R — {0}; the proof involves an €-6 
argument, which is left to the reader. 
(6) Let r: [0,1] — R be defined by 


1, ifxeQn0,1] 
r(x) = 
0, otherwise. 


We will prove that r is discontinuous everywhere. Let c € [0,1]. Suppose that r is 
continuous at c. Then there is some 6 > 0 such that x € [0,1] and |x —c| < 6 imply 
|r(x) —r(c)| < 4. Because c € [0,1], there is some 7) > 0 such that (c — 7,c) C [0, 1] 
or (c,c+ 7) C [0, 1]; we cannot be sure that both of these are true, because c might be 
one of the endpoints of [0, 1]. Without loss of generality, assume that (c,c+7)) C [0, 1]. 
Let t = min{6, 7}. 

There are now two cases. First, suppose that c is rational. We know by Theo- 
rem 2.6.13 (2) that there is some irrational y € R such that c << y<c+T. It follows 
that y € [0, 1] and |y—c| < 6. Hence |r(y) —r(c)| < 4. From the definition of the func- 
tion r it follows that |O—1| < x which is a contradiction. Second, suppose that c is 
irrational. A similar contradiction can be obtained, this time using Theorem 2.6.13 (1). 
We deduce that r is not continuous at c. 

(7) For this next example, we need to use the fact that every non-negative rational 
number can be expressed uniquely as a fraction in “lowest terms,’ which means as a 
fraction ¢ such that a ¢ NU {0} and b EN, and that a and b have no common factors 
other than | and —1. (Observe that 0 expressed in lowest terms is o, because every 
integer is a factor of 0.) The reader is informally familiar with this fact from years 
of experience with fractions. This fact can be proved rigorously starting from the 
basic properties of the integers that we have seen, though working through the details 
would take us too far afield, and so we will not provide such a proof. For details, 
the reader can either derive the desired fact about fractions from the Fundamental 
Theorem of Arithmetic, found for example in [Ros05, Section 3.5], or see a proof in 
[Olm62, Sections 402 and 404]. We will be using this fact about fractions in lowest 
terms only in the present example, and in subsequent examples that rely upon this 
one, but not in any proofs of theorems. 

Let s: [0, 1] — R be defined by 
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Zi if x€ QN [0,1] and x = : in lowest terms, 
s(x) = where p € NU {0} andg EN 
0, otherwise. 


We will prove that s is discontinuous at every rational number in [0, 1], and continuous 
at every irrational number in [0, 1]. The strange behavior of this function is somewhat 
counterintuitive, and it is due to the existence of such strange functions that we need 
to have rigorous definitions and proofs for concepts such as continuity, because our 
intuition about such matters might not always be correct. 

The proof that s is discontinuous at every rational number in (0, 1] is very similar 
to the proof that the function r in Part (6) of this example is discontinuous everywhere, 
and the details are left to the reader. The more interesting part of the proof is that s is 
continuous at every irrational number in [0, 1]. 

Let c € [0,1]. Suppose that c is irrational. Let € > 0. By Corollary 2.6.8 (2) there 
is some m € N such that 4 < €. There are only finitely many rational numbers in [0, 1] 
that have denominator m or smaller when expressed in lowest terms; let g1,...,q% € 
(0, 1] be these rational numbers. Let 6 = min{|c— qi|,...,|c — qx|}. Because c is 
irrational, then 6 > 0. 

Suppose that x € [0,1] and |x —c| < 6. There are now two cases. First, suppose 
that x is irrational. Then |s(x) — s(c)| = |0—0| < €. Second, suppose that x is rational. 
Then x = ¢ for some a € NU {0} and b EN, where § is in lowest terms. By the 


b 
choice of 6, we know that x  q; for alli € {1,...,k}. Hence b > m, and it follows 
that |s(x) —s(c)| =|; —0| < 4 < €. Combining these two cases, we deduce that s is 
continuous at c. © 


Given the very strange nature of the function s in Example 3.3.3 (7), the reader 
might wonder whether for any non-degenerate closed bounded interval C C R, and 
for any subset A C C, it would be possible to find a function f: C — R that is 
continuous at every number in A and discontinuous at every number in C — A. The 
answer turns out to be no. For example, as seen by Exercise 8.4.7, there is no function 
g: [0,1] — R that is continuous at every rational number in [0, 1], and discontinuous 
at every irrational number in [0,1]. See [TBBO1, Section 6.7] for a general discussion 
of which subsets of IR can be the set of numbers at which a function is continuous. 

The following result is the analog for continuous functions of the Sign-Preserving 
Property for Limits (Theorem 3.2.4). 


Theorem 3.3.4 (Sign-Preserving Property for Continuous Functions). Let ACR 
be a non-empty set, letc € A and let f: A— R be a function. Suppose that f is 
continuous at c. 


1. If f(c) > 0, then there is some M > 0 and some 6 > 0 such that x € A and 
|x —c| < 6 imply f(x) >M. 

2. If f(c) <0, then there is some N <0 and some 6 > 0 such that x € A and 
|x —c| < 6 imply f(x) <N. 


Proof. Left to the reader in Exercise 3.3.5. 
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Our next theorem shows that continuity is well behaved with respect to addition, 
subtraction, multiplication and division of functions. This theorem is very convenient 
for showing the continuity of functions that are built up out of simpler ones. 


Theorem 3.3.5. Let A C R be a non-empty set, letc € A, let f,g: A— R be functions 
and let k € R. Suppose that f and g are continuous at c. 


1. f+ is continuous at c. 

2. f —g is continuous at c. 

3. kf is continuous at c. 

4. fg is continuous at c. 

5. If g(c) £0, then £ is continuous at c. 


Proof. If A were an open interval, then this theorem could be deduced immediately 
by combining Lemma 3.3.2 and Theorem 3.2.10. In the general case, where A is not 
necessarily an open interval, we cannot use Lemma 3.3.2, but an examination of the 
details of the proof of Theorem 3.2.10 reveals that that proof can easily be modified 
to work in the present situation, simply by replacing L with f(c), replacing M with 
g(c) and replacing J — {c} with A. The details are left to the reader. 


The following result is an immediate consequence of Theorem 3.3.5, and we omit 
the proof. 


Corollary 3.3.6. Let A C R be a non-empty set, let f,g: A — R be functions and 
let k € R. Suppose that f and g are continuous. Then f + g, f —g, kf and fg are 
continuous, and if g(x) #0 for all x € I then f is continuous. 


Example 3.3.7. 


(1) Let A C R be a non-empty set. For each n €N, let f,: A — R be defined by 
fn(x) =x" for all x € A. We will prove that f,, is continuous for all n € N by induction 
on n. First, let n = 1. Then f,(x) = fi(x) = x for all x € A, and we have seen that 
this function is continuous in Example 3.3.3 (1). Now let n € N, and suppose that 
fn is continuous. Then fy+1(x) =x"t! = x” -x = fy(x) fi (x) for all x € A, and hence 
Savi = Snfi. It follows from Theorem 3.3.5 (4) that f,.1 is continuous. By induction, 
we deduce that f,, is continuous for all n € N. 

Example 3.3.3 (1) shows that all constant functions A — R are continuous, and it 
then follows from the previous paragraph together with Theorem 3.3.5 (1) (3), that all 
polynomial functions are continuous. 

(2) Let p: R— {0} — R be defined by p(x) = ¢ for all x € R — {0}. We saw in 
Example 3.3.3 (2) that p is continuous, using the e—6 definition of continuity. A much 
simpler proof can be obtained using Theorem 3.3.5 (5) by observing that the functions 
h,k: R— {0} — R defined by h(x) = 1 for all x € R and k(x) = x for all x € R are 
continuous by Example 3.3.3 (1). o) 


We now see that continuity is also well behaved with respect to the composition 
of functions. 


Theorem 3.3.8. Let A,B C R be non-empty sets, let c € A and let g: A — B and 
f: BR be functions. 
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1. Suppose that A is an open interval. If lim g(x) exists and is in B, and if f is 
x—-Cc 


continuous at lim g(x), then lim f(g(x)) = f(lim g(x)). 
xc x—Cc x= 
2. If g is continuous at c, and if f is continuous at g(c), then f 0 g is continuous 
atc. 
3. If g and f are continuous, then f 0° g is continuous. 


Proof. 

el 

Let L = lim g(x). Let € > 0. Then there is some 7 > 0 such that y € Band |y—L| <n 
D ad 4 


imply | F(x) — f(L)| < €, and there is some 6 > 0 such that x € A— {c} and |x—c| <6 

imply |g(x) —L| < 7. Suppose that x € A — {c} and |x—c| < 6. Then |g(x) —L| <n, 

and hence | f(g(x)) — f(L)| < €, which means | f(g(x)) — f (lim g(x))| < €. It follows 
x—-C 


that lim f(g(x)) = f(lim g(x)). 
Por d O8 xc 
(2) Because A is not necessarily an open interval, this part of the theorem cannot 


be deduced from Part (1) of the theorem. However, the proof of this part of the 
theorem is very similar to the proof of Part (1), but with g(c) replacing lim g(x); the 
x—-c 


~ 


Suppose that lim g(x) exists and is in B, and that f is continuous at lim g(x). 
XC xc 


details are left to the reader. 


(3) This part of the theorem follows immediately from Part (2) of the theorem. 


Whereas the composition of continuous functions works nicely, as stated in Theo- 
rem 3.3.8, the composition of discontinuous functions can behave rather strangely. 


Example 3.3.9. 
(1) Let h,k: R — R be defined by 


1, if 0 2, if 3 
ee er ate. Pe 
0, ifx=O, 0, ifx=3. 


It is straightforward to verify that h is discontinuous at 0, but continuous everywhere 
else, and that k is discontinuous at 3, but continuous everywhere else; the details are 
left to the reader. Observe that (koh) (x) = 2 for all x € R, so that koh is a constant 
function, and hence it is continuous by Example 3.3.3 (1). We therefore see that the 
composition of two discontinuous functions can be continuous. 

(2) Leth: R — R be the function given in Part (1) of this example, let r: [0,1] — 
R be the function given in Example 3.3.3 (6) and let s: [0,1] — R be the function 
given in Example 3.3.3 (7). We saw that h is discontinuous at 0, and continuous 
everywhere else in [0,1], and that s is discontinuous at every rational number in [0, 1], 
and continuous at every irrational number in [0, 1]. Observe that hos = r, and that r 
was seen to be discontinuous everywhere, which shows that the composition of two 
discontinuous functions can have “worse” discontinuity than either of the original 
functions. © 
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Our next result shows that two continuous functions on adjacent closed bounded 
intervals can be “pasted together” to form a continuous function if they agree on the 
point common to both domains. 


Lemma 3.3.10 (Pasting Lemma). Let [a,b] C R and [b,c] C R be non-degenerate 
closed bounded intervals, and let f : |a,b| — R and g: (b,c] — R be functions. Let 
h: [a,c] — R be defined by 


oy ffs fre lad 
Ho {I if € [bc 


If f and g are continuous, and if f(b) = g(b), then h is continuous. 


Proof. Left to the reader in Exercise 3.3.10. 


Finally, recall the concept of the extension of a function, as discussed prior to the 
proof of Theorem 2.7.1. We now see that not every continuous function with domain 
of the form A — {c} can be extended to a continuous function with domain A. 


Example 3.3.11. Consider the functions f,p: R— {0} — R defined by f(x) =x 
for all x € R— {0} and p(x) = + for all x € R — {0}. Both of these functions are 
continuous, as we saw in Example 3.3.7 (2). We observe that f can be extended 
to a continuous function F: R — R by defining F(0) = 0, so that F(x) = x for all 
x € R. On the other hand, the function p cannot be extended to a continuous function 
IR — R, as can be seen intuitively by looking at the graph of p, and can be proved by 
combining Lemma 3.3.2 with the fact that lim p(x) does not exist, which we saw in 


Example 3.2.3 (3). 0) 


Reflections 


In contrast to limits, which are often viewed by beginning real analysis students 
as a necessary technicality at best, the intuitive concept of continuity is one that most 
people find quite simple and understandable. It is easy to see the intuitive difference 
between a function that has a graph that can be drawn without lifting one’s pencil from 
the page and a function that has a graph that cannot be drawn that way. Continuity is 
therefore a concept for which there is a large gap between the intuitive idea, which is 
simple, and the rigorous definition, which is technical. It is worthwhile to take the 
time to convince yourself that the ¢—6 formulation really captures the idea of drawing 
a graph without lifting one’s pencil. 

The main topics of study in real analysis are the central concepts encountered in 
calculus courses such as derivatives, integrals, sequence, series and the like. In the 
context of real analysis, the concept of continuity plays a supporting role, though a 
technically important one. If the reader studies topology, however, then she will see 
continuity in a starring role. Continuous functions in topology play the analogous 
role to what homomorphisms play in abstract algebra and linear maps play in linear 
algebra, in that continuous functions are the type of function that preserves topological 
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structure. An excellent place to learn about the general concept of continuity is the 
classic introductory topology text [Mun00]. 


Exercises 


Exercise 3.3.1. Using only the definition of continuity, prove that the following 
functions are continuous. 


(1) Let f: R— R be defined by f(x) =x*+1 forallx ER. 
(2) [Used in Example 4.2.3.] Let A C R be a set, and let g: A — R be defined by 
g(x) = |x| for allx € A. 


Exercise 3.3.2. [Used throughout.] Let A C R be a set, let c € A and let f: A— R 
be a function. 


(1) Prove that f is continuous at c if and only if there is some 6 > 0 such that 
Ff lan(c—6,c+8) 18 continuous at c. 

(2) Let B CA bea set. Suppose that c € B. Prove that if f is continuous at c, then 
f\g is continuous at c. Deduce that if f is continuous, then f|z is continuous. 
Find an example to show that neither of these statements can be made into if 
and only if statements. 


Exercise 3.3.3. Let /,J C R be open intervals, and let f: 7 — R be a function. 
Suppose that f is continuous. Let x € f~'(J). Prove that there is some open interval 
K CRsuch thatx € KNIC f-!(J). 


Exercise 3.3.4. [Used in Theorem 3.5.2 and Exercise 3.5.8.] Let A, B C R be non-empty 
sets, and let f: B — R be a function. Suppose that A C B. 


(1) Suppose that A and f(A) have least upper bounds, and that lubA € B. Prove 
that if f is continuous at lubA then f(lubA) < lub f(A). 

(2) Suppose that A and f(A) have greatest lower bounds, and that glbA © B. Prove 
that if f is continuous at glbA then f(glbA) > glb f(A). 


Exercise 3.3.5. [Used in Theorem 3.3.4.] Prove Theorem 3.3.4. 


Exercise 3.3.6. Theorem 3.3.8 (1) was stated and proved for the case that A is an 
open interval and B is an arbitrary set. Give a simpler proof of this result in the case 
where B is also an open interval. 


Exercise 3.3.7. [Used in Exercise 3.3.8.] Let A C R be a set, let c € A, and let f: A— 
R be a function. Prove that if f is continuous at c, then there is some 6 > 0 such that 
Ff lan(c—6,c+8) 18 bounded. 


Exercise 3.3.8. [Used in Section 3.4.] Let C C R be a closed bounded inter- 
val, and let f: C — R be a function. Prove that if f is continuous, then f is 
bounded. [Use Exercise 3.3.7.] 


Exercise 3.3.9. [Used in Section 3.3.] Let A C R be a non-empty set, let c € A and let 
f: A— R bea function. Suppose that there is some L > 0 such that (A — {c}) N(c— 
L,c+) = 9. Prove that f is continuous at c. 
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Exercise 3.3.10. [Used in Lemma 3.3.10.] Prove Lemma 3.3.10. 


Exercise 3.3.11. [Used in Lemma 7.3.4.] Let A C R be a set, let f: A— R bea 
function and let p € R. Let A+ p denote the set {a+ p | a € A}. (The notation “A + p” 
is similar to the notation “A+ B” used in Exercise 2.6.9, where here we write “p” 
instead of “{p}.”) Let g: A+ p — R be defined by g(x) = f(x — p) for allx € A+ p. 


Prove that if f is continuous, then g is continuous. 


3.4 Uniform Continuity 


Continuity is a very important, and intuitively appealing, property of functions, but 
it turns out that for the proofs of some theorems in real analysis, for example the 
fact that a continuous function is integrable, a strengthened version of continuity is 
needed. To understand this strengthened version, let us first review what it means 
for a function to be continuous. Suppose that f: A — R is a continuous function for 
some set A CR. Then f is continuous at each c € A. Stated informally, to say that f 
is continuous for each c € A means that for each choice of c € A, and for each € > 0, 
we need to find some 6 > 0 such that if x € A and x is within distance 6 of c, then 
f(x) is within distance € of f(c). The choice of 6 here depends upon both c and ¢€, 
and of course upon the function f. For smaller € we need smaller 6, and so we cannot 
avoid the fact that 6 depends upon € (other than in exceptional cases, such as constant 
functions). Could it be the case that for a given €, we could use the same 6 for all 
values of c? The answer in general is no, though for some functions it is yes. 

Let g and f be the functions whose graphs are seen in Figure 3.4.1 and Figure 3.4.2, 
respectively. If we compare what happens at the two points c and d in the domain of 
the function g, we see that for the same €, we need a much smaller 6 at c than we do 
at d (note that 6 is not labeled in the figure for lack of space). As we take values of 
c closer and closer to 0, then we need smaller and smaller values of 6 for the same 
€. By contrast, we see in the graph of f that for any given €, it is possible to choose 
a value of 6 that works with respect to this € for any value of c (intuitively, choose 
the 6 that works where the graph has the largest slope). Of course, we are looking at 
these graphs just for the intuitive idea; we will see proofs of what we have asserted in 
Example 3.4.3 (2) and Exercise 4.4.6. 

From the above examples, we see that whereas in principle the choice of 6 when 
proving that a function is continuous depends upon c and €, for some functions the 
choice of 6 depends only upon €. 

To obtain a better understanding of this situation, let us turn to the definition of 
continuity in terms of quantifiers and logical symbols. Again, suppose that we are 
given a function f: A — R for some set A C R. The condition that f is continuous at 
each c € A is expressed by writing 

(Vc € A)[f is continuous at c], 


which can be written completely in symbols as 


(Ve € A)(Ve > 0)(45 > 0)[(x CAA |x—c] < 8) > | f(x) —f(c)| < €]. 
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Fig. 3.4.2. 


As always, the order of the quantifiers is crucial. Because we are first given c and 
€, and we then show that there exists an appropriate 6, the choice of 6 can depend 
upon both c and €. However, given that we saw above that it could happen for some 
functions that 6 does not depend upon c, even though 6 is quantified after c in the 
given order of quantifiers, it is therefore possible that for some functions we could 
replace the above statement in logical symbols with 


(Ve > 0)(46 > 0)(Ve € A)[(x EAA |x—c] < 8) > | f(x) — f(c)| < €]. 


In this new formulation, the roles of x and c are in fact equivalent, even if it might not 
appear as such at first, and we can therefore rewrite this formulation as 


(Ve > 0)(46 > 0)(Vx € A)(Vy € A)[lx—y] < 6 > [F(x) — fF) < €]- 
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“y ” 


In this last formulation we renamed “c’”’ which does not make a difference 
logically, but the symbols “x” and “y” ees suggest a parallel role for the two 
numbers. For some functions we can use this revised order for the quantifiers, and for 
other functions we cannot. In those situations where we can find 6 that depends only 
upon €, and not c, we obtain a stronger version of continuity, which we now define. 


Definition 3.4.1. Let A C R be a set, and let f: A — R be a function. The function f 
is uniformly continuous if for each € > 0, there is some 6 > 0 such that x,y € A and 
|x—y| < 6 imply | f(x) — f(y)| <e. a 


Observe that in contrast to the notion of continuity, which is defined separately at 
each number in the domain of the function, we do not have the concept of “uniformly 
continuous at a point,” because the whole idea is that the same 6 works for a given € 
for all points in the domain. 

The definitions of continuity and uniform continuity immediately imply the 
following lemma, and we omit the proof. 


Lemma 3.4.2. Let A C R be a set, and let f: A— R be a function. If f is uniformly 
continuous, then f is continuous. 


Whereas uniform continuity implies continuity, it is not always the case that 
continuity implies uniform continuity, as we see in Part (2) of the following example. 


Example 3.4.3. 


(1) Let f: R— R be defined by f(x) = mx-+b for all x € R, where m,b € R. It 
was shown in Example 3.3.3 (1) that f is continuous, and we will now show that f is 
uniformly continuous. There are two cases. First, suppose that m = 0. In that case f is 
a constant function, and | f(x) — f(y)| = 0 for all x,y € R. Hence any 6 > 0 works for 
any € > 0. Second, suppose that m 4 0. Let € > 0. Let 6 = iat? Suppose that x,y € R 
and |x —y| < 6. Then | f(x) — f(y»)| = |(mx+b) — (my+b)| = |m|-|x—y| < |m|-6 =e. 

(2) Let g: R— {0} — R be defined by g(x) = 1 for all x € R — {0}. We saw in 
Example 3.3.3 (2) that g is continuous. We will now show that g is not uniformly 
continuous, which corresponds to what we saw intuitively in Figure 3.4.1, which has 
part of the graph of g. To prove that g is not uniformly continuous, we show that there 
is some € > 0 such that for every 6 > 0, there are x,y € A such that |x — y| < 6 and 
|g(x) —g(y)| 2 €. 

Let € = 1. Let 5 > 0. Let x = V6 and y = -X©_. Then 


V6+1 
V5 6 
Vb <6, 
c= Ms V6+1| Vb+1 
and 
i 4 1 vV6+1 
— —_— =l= 
js) -eo)|=|2-2] =| 


Therefore g is not uniformly continuous. 
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(3) Let h: (1,00) > R be defined by h(x) = + for all x € (1,c¢). Observe that h 
is just the restriction to (1,°°) of the function p in Part (2) of this example. As we 
now see, the restriction of the domain of p to (1,°°) yields a uniformly continuous 
function. Let € > 0. Let 6 = €. Suppose that x,y € R and |x — y| < 6. Then 


y-x 
xy 


Ix-y| _ 46 
xy 1-1 ° 


In) —h()| = 


Eg 
y 


Whereas a comparison of the definitions of uniform continuity and continuity 
appears to be a matter of epsilons, deltas and quantifiers, a comparison of the various 
functions examined in Example 3.4.3 yields the intuitive idea that a function is 
uniformly continuous if it is continuous and if ““f(x) does not change too rapidly as 
x changes.” Although the definition of uniform continuity does not have anything 
to do with differentiability (which we have not yet defined), the intuitive notion of 
not changing too rapidly is reminiscent of the intuitive notion of derivatives. Indeed, 
as will be seen in Exercise 4.4.6, if a function is differentiable and has bounded 
derivative, then it is uniformly continuous. However, it is important to stress that 
differentiability is not needed for a function to be uniformly continuous, and we 
mention it here only as an aid to our intuition. 

Continuous functions are not always uniformly continuous, but there is one 
common, and very useful, situation in which continuity does imply uniform continuity, 
as we see in the following theorem. This theorem is our first truly substantial result 
involving limits and continuity. In contrast to all the other proofs up till now in the 
present chapter, which were relatively straightforward, and which relied upon only the 
algebraic properties of the real numbers, the proof of Theorem 3.4.4, while not long, 
relies upon the Least Upper Bound Property of the real numbers (via the Heine—Borel 
Theorem (Theorem 2.6.14), which is proved using the Least Upper Bound Property). 
Put another way, whereas all the previous results in this chapter would still be true if 
we considered functions defined on subsets of the rational numbers, Theorem 3.4.4 
would not is true in such a situation. For example, let f: [0,2] 1Q — Q be defined 
by f(x) = =>5 5 for all x € [0,2] NQ. This function is continuous, because the analogs 
of all reostilte pores in Section 3.3 would still hold for functions defined on subsets of 
Q, and because the denominator of as is never zero (because V2 2 ¢ Q). However, 
the function is not uniformly continuous, by an argument similar to that used in 
Example 3.4.3 (2). 


Theorem 3.4.4. Let C C R be a closed bounded interval, and let f: C — R be a 

function. If f is continuous, then f is uniformly continuous. 

Proof. Suppose that f is continuous. Let € > 0. By the definition of continuity, 

for each z € C there is there is some 6, > 0 such that x € C and |x—2z| < & 

imply |f(x) — f(z)| < 5. We then form the family {(w-4 SY owt 5) \ of 
weC 


open intervals in R. Because w (w bu swt ) for all w € C, then C C 


Uwec (w - w+ %). The Heine—Borel Theorem (Theorem 2.6.14) implies that 
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there aren € N and wj,w2,...,Wn € C such that CC Ug_y (we — aE, we + * }. Let 


op Ow 
6 = min{ hy ..+4 


Suppose that x,y € C and |x—y| < 6. Because y € C, there is some p € {1,...,n} 


Own 
ah 


Vp 


such that y (wp 3 yWpt oe ii Hence |y —wp| < al . By the definition of 6, 


we also know that |x —y| < em It follows that 


bw, ’p 
|x—wp| = |x -y+y—wp| < |x y| ly Wl < ai = by, 


We now deduce from the choice of 6,,, that | f(y) — f(wp)| < 5 and | f(x) — f(wp)| < 
5. Therefore 


If(x) — FO) 


I 
= 
Ray) 


f (Wp) +f (wp) —F()| 
LPC) — Fl p)I-+ LFwp) - FO) < 5 +5 =8- 


IA 


Recall the definition of a function being bounded (given in Section 3.2). Is there a 
relation between continuity or uniform continuity and boundedness? Clearly, a func- 
tion can be continuous and not bounded, for example the function in Example 3.4.3 (2). 
A function can also be uniformly continuous and not bounded, for example the func- 
tion in Example 3.4.3 (1). However, there is something very different about these two 
examples. For the linear function in Example 3.4.3 (1), the fact that the function is 
not bounded is due to the fact that the domain is not bounded; that is, if we restrict 
the function in Example 3.4.3 (1) to any bounded interval, then the restricted function 
is itself bounded. By contrast, if we restrict the function in Example 3.4.3 (2) to the 
interval (0,1), then the restricted function is still not bounded, even though the domain 
of the restricted function is bounded. The difference between these two functions 
is precisely the difference between uniform continuity and regular continuity. If a 
function is uniformly continuous, then intuitively f(x) cannot change too much if x 
does not change too much, and that suggests that the only way a uniformly continuous 
function can be not bounded is if its domain is not bounded. We now state and prove 
this fact. 


Theorem 3.4.5. Let A C R be a non-empty set, and let f: A — R be a function. 
Suppose that A is bounded. If f is uniformly continuous, then f is bounded. 


Proof. Suppose that f is uniformly continuous. Because the set A is bounded, there 
is some M € R such that |x| < M for all x € A. Hence A C [—M,M]. We may assume 
that M > 0. Because f is uniformly continuous, there is some 6 > 0 such that x,y € A 
and |x —y| < 6 imply |,f(x) — f(y)| < 1. By Corollary 2.6.8 (2) there is some n € N 
such that + < 52, which implies 4 < 6. 

We now divide the interval [—M,M] into n equal subintervals, which we do by 
letting x9,x1,---,%» € [—M,M] be defined by the conditions that —M = xp < x1 < 
+++ <Xy = M and that x; —x;_) = ™ for alli € {1,...,n}. 
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Let i € {1,...,n}. Let E; € R be defined as follows. If AN [x;_1,x;] = @, then let 
E; = 0. If AN [x;-1,xi] 4 @, then choose some e; € AN [xj;_-1,x;] Git does not matter 
which e; is chosen), and let E; = |f(e;)| +1. Next, let E = max{E|,Fo,...,En}. 

Let y € A. Then y € AN [xg_1, x4] for some k € {1,...,}, and therefore |y — e;| < 
lxe —xx-1| = * < 6. It follows that | f(y) — f(ex)| < 1, and then Lemma 2.3.9 (7) 
implies that |f(y)| < |f(ex)| +1 = Ex < E. We deduce that f is bounded, with bound 
E. 


The following result was proved in Exercise 3.3.8 by a direct use of the Heine— 
Borel Theorem (Theorem 2.6.14), but a particularly simple proof can be obtained 
using theorems we have seen in this section. Of course, we cannot really escape the 
Heine—Borel Theorem here, because it is used in the proof of Theorem 3.4.4, but now 
that we have proved the latter, we obtain the result stated in Exercise 3.3.8 with no 
extra work. 


Corollary 3.4.6. Let C C R be a closed bounded interval, and let f: C — R be a 
function. If f is continuous, then f is bounded. 


Proof. Suppose that f is continuous. By Theorem 3.4.4 we know that f is uniformly 
continuous. Because C is bounded, we can apply Theorem 3.4.5 to deduce that f is 
bounded. 


Reflections 


In contrast to the concept of continuity, which from an intuitive point of view 
is both easy to understand and is familiar from calculus courses, the concept of 
uniform continuity is neither intuitively simple nor familiar from previous courses. 
Nonetheless, uniform continuity is a very important technical concept that shows up 
in the proofs of a number of theorems in real analysis, for example Theorem 5.4.11, 
which states that continuous functions on non-degenerate closed bounded intervals 
are integrable. 

It is not hard to imagine how someone might have first thought of the concept 
of continuity at the intuitive level, simply by looking at graphs of functions; the 
rigorous definition, needless to say, was much harder to think of. We can speculate 
that uniform continuity, by contrast, might have been first conceptualized not from 
intuitive considerations but rather as a result of trying to prove theorems such as 
Theorem 5.4.11, and noticing that something more than the definition of continuity 
was needed to make the proof work. It would have been subsequently necessary to 
investigate the relationship between the intuitively familiar concept of continuity and 
the technically necessary concept of uniform continuity; the culmination of such an 
investigation would have been Theorem 3.4.4, a theorem that is used in the proof of 
Theorem 5.4.11. Of course, the actual historical development of mathematics does 
not always follow the sort of logical order just described, but it is nonetheless useful 
to think about how mathematical ideas might have developed logically, in order to 
understand their significance. 
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The study of uniform continuity should serve to reinforce the importance of 
quantifiers, because the difference between the definitions of continuity and uniform 
continuity is precisely in the order of the quantifiers. The formulation of rigorous 
proofs relies upon a good understanding of the use of quantifiers, and nowhere is this 
fact more apparent than in proofs involving uniform continuity. 


Exercises 


Exercise 3.4.1. Using only the definition of uniform continuity, prove that the fol- 
lowing functions are uniformly continuous. 


(1) Let f: [0,3] — R be defined by f(x) = x’ for all x € [0,3]. 
(2) Let g: [1,2] — R be defined by g(x) = ./x for all x € [1,2]. 


Exercise 3.4.2. Using only the definition of uniform continuity, prove that the func- 
tion f: R — R defined by f(x) =x? for all x € R is not uniformly continuous. 


Exercise 3.4.3. Let A C R be a set, let f,g: A — R be functions and let k € R. 
Suppose that f and g are uniformly continuous. 


(1) Prove that f + g is uniformly continuous. 
(2) Prove that kf is uniformly continuous. 
(3) Find an example to show that fg need not be uniformly continuous. 


Exercise 3.4.4. Let A,B C R be sets, and let g: A — B and f: B — R be func- 
tions. Suppose that f and g are uniformly continuous. Prove that fo g is uniformly 
continuous. 


Exercise 3.4.5. [Used in Exercise 4.4.6, Exercise 4.6.9 and Exercise 10.3.7.] Let A C 
R be a set, and let f: A — R be a function. The function f satisfies a Lipschitz 
condition if there is some K € R such that | f(x) — f(y)| < K|x—y| for all x,y € A; 
the number K is called a Lipschitz constant for f. 


(1) Prove that if f satisfies a Lipschitz condition, then f is uniformly continuous. 
(2) Find an example of a function g: [0,0¢) — R that is uniformly continuous but 
does not satisfy a Lipschitz condition. 


Exercise 3.4.6. [Used in Theorem 5.5.4.] Let n € N, and let [a,b1],...,[an,dn] CR 
be closed bounded intervals. Let f: [a1,b1]U---U [an,bn] — R be a function. Prove 
that if f is continuous, then f is uniformly continuous and bounded. 

[Use Exercise 2.5.14.] 


Exercise 3.4.7. Find an example of a function f: IR — R that is continuous and 
bounded, but that is not uniformly continuous. Be sure to prove that the function is 
not uniformly continuous. 


Exercise 3.4.8. Find an example of two disjoint, non-empty sets A,B C R and a 
function f: AUB — R such that f|4 and f|g are uniformly continuous, but that f is 
not uniformly continuous. 
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Exercise 3.4.9. Let (a,b) C R be a non-degenerate open bounded interval, and let 
f: (a,b) + R be a function. Prove that f is uniformly continuous if and only if f can 
be extended to a continuous function F': [a,b] > R. 

When proving that if f is uniformly continuous then f can be extended to a 
continuous function F: [a,b] — R, it suffices to prove that f can be extended to a 
continuous function G: [a,b) — R; extending G to a continuous function F: [a,b] > 
R is completely analogous, and the details can be omitted. To define G, use the 
one-sided analog of Exercise 3.2.18. [Use Exercise 3.3.2.] 


3.5 Two Important Theorems 


We are now ready to state and prove two very important theorems concerning con- 
tinuous functions defined on closed bounded intervals, which are the Extreme Value 
Theorem and the Intermediate Value Theorem. Both of these theorems are encoun- 
tered informally in calculus courses, but in real analysis we see their worth more 
clearly, because they are useful tools in the proofs of important theorems that we will 
subsequently encounter. For example, the Extreme Value Theorem will be used in the 
proof of Rolle’s Theorem (Lemma 4.4.3), which in turn is used in the proof of the 
Mean Value Theorem (Theorem 4.4.4), and it is also used in the proof that all con- 
tinuous functions are integrable (Theorem 5.4.11). The Intermediate Value Theorem 
is used in the proof that the natural logarithm function is bijective (Lemma 7.2.4), 
which in turn is used to define the exponential function. 

Although both of these theorems are concerned with continuous functions defined 
on closed bounded intervals, it turns out that they involve very different aspects 
of closed bounded intervals. This difference can be fully understood only via the 
study of the concepts of “compactness” and “connectedness,” which are treated in 
an introductory course in point set topology. Indeed, each of the Extreme Value 
Theorem and the Intermediate Value Theorem can be greatly generalized through the 
use of these topological concepts. See the classic text [Mun00] for an introduction 
to point set topology, including connectedness and compactness, and generalized 
versions of the Extreme Value Theorem and the Intermediate Value Theorem. These 
generalizations are an instance where greater abstraction actually leads to greater 
clarity. 

Our first theorem, the Extreme Value Theorem, concerns the existence of max- 
imum and minimum values of functions. Must every function have a maximum 
value and a minimum value? The answer is clearly no, because of functions such as 
f: R—R defined by f(x) = x for all x € R. Can we find criteria that would guarantee 
that a function has a maximum value and a minimum value? The function f has a 
domain that is not bounded, and so it is natural to ask whether functions with bounded 
domains always have maximum and minimum values, but again the answer is no. The 
function g: (0,1) > R defined by g(x) = + for all x € (0,1) has neither maximum 
value nor minimum value. The apparent problem with the function g is that its domain 
is an open interval. Would it suffice to restrict our attention to functions with domains 
that are closed bounded intervals? The answer is still no. The function h: [0,1] > R 
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defined by 


te aft 1 
hin=< : x€ (0,1) 
0, ifx=0 


has a minimum value at x = 0, but it does not have a maximum value. The problem 
with the function / is that it is not continuous. As seen in the following theorem, we 
have found all possible difficulties, because every continuous function with domain a 
closed bounded interval has a maximum value and a minimum value. Observe that in 
the statement of our theorem, we are not concerned with the actual maximum value 
and minimum value of the function, but only that a maximum value and a minimum 
value occur somewhere in the domain. 


Theorem 3.5.1 (Extreme Value Theorem). Let C C R be a closed bounded inter- 
val, and let f: C — R be a function. Suppose that f is continuous. Then there are 
XminsXmax © C such that f(xmin) < f(x) < f(%max) for all x €C. 


Proof. By Corollary 3.4.6 we know that f is bounded, which means that the set f(C) 
is bounded. Because C # @, then f(C) 4 0. The Least Upper Bound Property and the 
Greatest Lower Bound Property imply that f(C) has a least upper bound and a greatest 
lower bound. We will show that there is some Xmnax € C such that f(%max) = lub f(C). 
It will follow that f(x) < f(%max) for all x € C. A similar proof can be used to find 
Xmin, and we omit the details. 

For convenience let M = lub f(C). Then f(x) < M for all x € C. Suppose that 
f(x) <M for all x € C. Let g: C > R be defined by 


1 
8) = FG) 
for all x € C. Because f is continuous, and because the denominator in the definition 
of g is never zero, it follows from Example 3.3.3 (1) and Theorem 3.3.5 that g is 
continuous. 

By Corollary 3.4.6 again we know that g is bounded. Hence there is some P € R 
such that |g(x)| < P for all x € C. Observe that P > 0. Moreover, because we are 
assuming that f(x) < M for all x € C, it follows that g(x) > 0 for all x € C. Hence 
g(x) <P for all x € C, which means that 


1 
= ep 
M—f() ~ 
for all x € C. Therefore i 
<M—— 
fe) <M-5 


for all x € C. We deduce that M— ; is an upper bound of f(C), which is a contradiction 
to the fact that M = lub f(C). It is therefore not the case that f(x) < M for all 
x € C. Because f(x) < M for all x € C, there must be some Xing, € C such that 
f (Xmax) =M. 
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It is important to note that the numbers X,,;, and Xq,, Whose existence is guar- 
anteed in the statement of the Extreme Value Theorem (Theorem 3.5.1), are not 
necessarily unique. Moreover, the Extreme Value Theorem does not tell us how to 
find Xmin and Xmax; We are told only that they exist. This theorem is an example of an 
“existence theorem,” as is the Intermediate Value Theorem, to which we now turn. 

Suppose that f: [a,b] — R is a function for some closed bounded interval [a,b] C 
R. Must it be the case that f takes on all values between f(a) and f(b)? The answer 
is clearly no. Let k: [0,1] — R be defined by 


1, if 1 
as” ree) 
0, ifx=O0. 


Then k(0) = 0 and k(1) = 1, but k does not take on any values between 0 and 1. Of 
course, the function k is not continuous, and that is the source of the problem. The 
following theorem states, as expected, that a continuous function f: [a,b] — R takes 
on all values between f(a) and f(b). 


Theorem 3.5.2 (Intermediate Value Theorem). Let [a,b] C R be a closed bounded 
interval, and let f : [a,b] + R be a function. Suppose that f is continuous. Let r € R. 
If r is strictly between f(a) and f(b), then there is some c € (a,b) such that f(c) =r. 


Proof. Suppose that r is strictly between f(a) and f(b). Without loss of generality, 
assume that f(a) <r < f(b). Let 


S = {x € [a,b] | f(x) <r}. 


Then S C [a,b]. The set S is non-empty because a € S, and S is bounded above by b. 
The Least Upper Bound Property implies that S has a least upper bound. Let c = lubS. 
Because a € S, then a < c, and because b is an upper bound of S, then c < b. Hence 
c € [a,b]. We will show that f(c) =r, which we do by showing that f(c) < r and that 
fc) =r. 

Because S # 0, then f(S) 4 0. It is evident from the definition of S that f(S) is 
bounded above by r. The Least Upper Bound Property again implies that f(S) has 
a least upper bound. By Exercise 3.3.4 (1) we see that f(lubS) < lub f(S). Hence 
f(c) < lub f(S). Because r is an upper bound of f(S), it follows that lub f(S) < r, 
and therefore f(c) <r. 

Because f(c) <r < f(b), we see that c  b. Hence c < b. It follows that the 
interval (c,b] is non-degenerate. Let B = (c,b]. Then f(B) 4 0. Clearly c = glbB. 
Moreover, because c = lubS, it follows that B C [a,b] — S. Hence f(x) > r for all 
x € B. Therefore f(B) is bounded below by r. The Greatest Lower Bound Property 
implies that f(B) has a greatest lower bound. By Exercise 3.3.4 (2) we see that 
f(glbB) > glb f(B). Hence f(c) > glb f(B). Because r is a lower bound of f(B), it 
follows that glb f(B) > r, and therefore f(c) > r. We deduce that f(c) =r. 

Finally, because r 4 f(a) andr 4 f(b), it follows that c 4 a and c ¥ b. Therefore 
c € (a,b). 
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Similarly to the Extreme Value Theorem (Theorem 3.5.1), the Intermediate Value 
Theorem (Theorem 3.5.2) is also an existence theorem, in that we are given no 
information on how to find the number c whose existence is guaranteed by the 
theorem. Also similarly, the number c is not necessarily unique. 

The proofs of the Extreme Value Theorem and the Intermediate Value Theorem 
both rely upon the Least Upper Bound Property. In fact, we will now show that both 
of these theorems are equivalent to the Least Upper Bound Property. While we are at 
it, we will also show that the Heine—Borel Theorem (Theorem 2.6.14) is equivalent to 
the Least Upper Bound Property. 

What we mean by “equivalent” in this context is as follows. As stated in Sec- 
tion 2.2, we have taken as our hypotheses for R the axiom for an ordered field and the 
Least Upper Bound Property; from these assumptions we deduce all our results in real 
analysis. To say that a theorem that we have proved is equivalent to the Least Upper 
Bound Property means that if an ordered field F is assumed to satisfy this theorem, 
then the Least Upper Property Property can be proved for F’. In other words, for an 
ordered field, the Least Upper Bound Property and the other theorem each imply the 
other. The proof of the equivalence of various theorems with the Least Upper Bound 
Property will be by contrapositive, where we suppose that F is an ordered field that 
does not satisfy the Least Upper Bound Property, and where we then show that the 
various theorems do not hold. 

Observe that any ordered field, whether or not it satisfies the Least Upper Bound 
Property, satisfies all the properties of R that do not rely upon the Least Upper 
Bound Property, for example all the properties of R that are proved in Sections 2.3- 
2.5. Moreover, none of the results concerning limits and continuity that we saw in 
Sections 3.2 and 3.3 rely upon the Least Upper Bound Property, and hence they hold 
for all ordered fields. 

We start with the following lemma about ordered fields that do not satisfy the 
Least Upper Bound Property. 


Lemma 3.5.3. Let F be an ordered field. Suppose that F does not satisfy the Least 
Upper Bound Property. Let A © F be a non-empty set such that A is bounded above, 
but A has no least upper bound. Let a € A, and let b € F be an upper bound of A. Let 
Q = {x € [a,b] | x is an upper bound of A} and P = [a,b] — Q. 


. PUQ=[a,b] and PNQ=0. 

. a<b, and AN|{a,b| CP, anda€ P, andbe@. 

. fx € Pandz€ Q, thenx <z. 

. Ifx € P, then there is some y € P such that x < y. If z € Q, then there is some 
w € Q such that w < z. 

. The set P does not have a least upper bound, and the set Q does not have a 

greatest lower bound. 


RwWnoDs 


nn 


Proof. 
(1) This part of the lemma is true by the definition of P and Q. 
(2) Because a € A and Db is an upper bound of A, it follows that a < b, and 
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that b € Q. Because A has no least upper bound, then by Exercise 2.6.2 (2) we know 
that no upper bound of A is in A. Therefore AN Q = 9, and hence AN [a,b] C P. In 
particular, we see that a € P. By Part (1) of this lemma we know that PM Q = 0, and 
therefore a & b. It follows that a < b. 


(3) Let U = {x € F | x is an upper bound of A} and L = F — @. It follows from 
Exercise 2.6.4 that if x € L and y € U, then x < y; although that exercise was stated 
for R, its proof did not involve the Least Upper Bound Property, and hence it also 
holds for the ordered field F. Because Q C U and P CL, it follows immediately that 
if x € Pandz€ Q, then x < z. 


(4) Letx € P. Then x is not an upper bound of A by the definition of P, and hence 
there is some y € A such that x < y. Because b is an upper bound of A, then y < b, 
and because x € P C [a,b], then a <x < y. Hence y € AN [a,b], and it follows from 
Part (2) of this lemma that y € P. Let z € Q. Then z is an upper bound of A. Because A 
has no least upper bound, there is another upper bound w of A such that w < z. Then 
a<w<z<hb,and hence w € Q. 


(5) It follows from Part (3) of this lemma that everything in P is a lower bound of 
Q, and everything in Q is an upper bound of P. 

Suppose that Q has a greatest lower bound. Because a is a lower bound of Q and 
b € Q, then glbQ € [a,b]. There are now two cases. First, suppose that glbQ € Q. 
Then glbQ is an upper bound of A. Let w be an upper bound of A. Because a € A, 
then w > a. If w < b, then w € Q, and hence glbQ < w. If w > b, then glbQ <b<w. 
Therefore glbQ is the least upper bound of A, which is a contradiction. Second, 
suppose that glbQ € P. By Part (4) of this lemma there is some y € P such that 
glbQ < y, which is a contradiction to the fact that glb Q is the greatest lower bound 
of Q, because y is a lower bound of Q. We conclude that Q does not have a greatest 
lower bound. 

Now suppose that P has a least upper bound. Because a € P and b is an upper 
bound of P, then lub P € [a,b]. Let v € A. Then v is not an upper bound of A, as noted 
in the proof of Part (2) of this exercise. Because b is an upper bound of A, then v < b. 
If v > a, then v € P, and hence lub P > v. If v <a, then lubP > a> v. Therefore lub P 
is an upper bound of A, which implies that lub P € Q. By Part (4) of this lemma there 
is some w € Q such that w < lubP, which is a contradiction to the fact that lub P is the 
least upper bound of P. We conclude that P does not have a least upper bound. 


Theorem 3.5.4. The following are equivalent. 


a. The Least Upper Bound Property. 
b. The Heine—Borel Theorem. 

c. The Extreme Value Theorem. 

d. The Intermediate Value Theorem. 


Proof. We have already seen that the axioms of the real numbers, that is, the axiom 
for an ordered field together with the Least Upper Bound Property, imply the Heine— 
Borel Theorem, the Extreme Value Theorem and the Intermediate Value Theorem. 
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We will now show that each of these theorems, together with the axiom for an ordered 
field, implies the Least Upper Bound Property, which we will do by letting F be an 
ordered field that does not satisfy the Least Upper Bound Property, and then deducing 
that F does not satisfy any of the Heine—Borel Theorem, the Extreme Value Theorem 
and the Intermediate Value Theorem. 

Let a, b, A, P and Q be as in Lemma 3.5.3. By Parts (1) and (2) of that lemma we 
know that PUQ = [a,b], and PQ = 90, anda < b, and AN [a,b] C P, anda € P, and 
bead. 

Let x € P. By Lemma 3.5.3 (4) there is some d; € P such that x < d,. Letcy =x—1. 
Then x € (cy,dy). Let u € [a,b] (cx, dy). Then u < dy. Because d, € P, it follows 
from Lemma 3.5.3 (3) that u € P. Hence [a,b] (cx,d) C P. Let z € Q. By a similar 
argument there is an open interval (s,,t,) in R such that s, € Q, and z € (s;,t,), and 
[a,b] N (Sz,tz) c Q. 

Because PU Q = [a,], then 


[a,b] C U (cx,dy) U ey (Sz,tz). 


xeP zeQ 


Let p,q EN, and x1,x2,...,xp € P, and z1,2Z2,.--,Zg € Q. We claim that 
p 

[a,b g U (Cx; x;) U 
i=l 


Let d = max{d,, ,d,),.-.,dx,}. Then d € P. By Lemma 3.5.3 (4) there is some w € P 


such that d < w. Then 
Pp 


Ww ¢ U (¢x;,dx;). 


i=1 
By Lemma 3.5.3 (3) we know that w < s;, forall j € {1,...,q}. Then 


q 
WEY (Sejste,)- 
j=l 


It follows that 
P q 


w¢ U (Cx;,dx;) U U (Saftey ls 


i=1 j=l 


which proves the claim. We have therefore seen that the family {(c,,dx)} cp U 
{(5z,t2) }ze@ Of open intervals satisfies the hypothesis of the Heine—Borel Theorem, 
but not the conclusion of the theorem. Therefore the Heine—Borel Theorem does not 
hold for F. 

Next, let f: [a,b] — R be defined by 
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Let v € [a,b]. We will show that f is continuous at v. There are two cases. First, sup- 
pose that v € P. Then v € (c,,d,), and [a,b] (cy, dy) C P. Because (c,,d,) is an open 
interval, by Lemma 2.3.7 (2) there is some 6 > 0 such that (v—6,v+ 6) C (cy,d,). 
It follows that [a,b] (v—6,v+6) CP. Let g= f\{a.4\qy—8,v45)- Then g(x) = x for 
all x € [a,b] (v—6,v+ 6). Therefore g is continuous at v by Example 3.3.3 (1) and 
Exercise 3.3.2 (1), neither of which relies upon the Least Upper Bound Property, and 
hence both work for F’. It now follows from Exercise 3.3.2 (1) that f is continuous 
at v, where, again, the exercise does not rely upon the Least Upper Bound Property. 
A similar argument works when v € Q, and we omit the details. We deduce that f is 
continuous. 

If x € Q, then f(x) =a—1<a= f(a), and if x € P, then by Lemma 3.5.3 (4) 
there is some y € P such that x < y, and hence f(x) =x < y= f(y). It follows that 
there is NO Xnax € [a,b] such that f(x) < f(%max) for all x € [a,b]. Hence the Extreme 
Value Theorem does not hold for F. 

Finally, let g: [a,b] — R be defined by 


0, ifxeP 
g(x) = 
1, ifxe@d. 


A similar argument to the one used with the function f shows that g is continuous. 
On the other hand, by the arguments used in Section 2.4, all of which apply to F’, we 
know that Q C F, and in particular that 5 € F.. Hence the Intermediate Value Theorem 
does not hold for F. 


Because of Theorem 3.5.4, the reader might jump to the conclusion that all the 
important theorems of real analysis that make use of the Least Upper Bound Property 
in their proofs are equivalent to this property, but that is not the case. For example, the 
Archimedean Property (Theorem 2.6.7) is a very important and useful theorem, and 
its proof makes use of the Least Upper Bound Property, but the Archimedean Property 
is in fact not equivalent to the Least Upper Bound Property, for the following reason. 
It was proved in Exercise 2.4.10 that the Archimedean Property holds for the rational 
numbers, and it therefore follows that the axiom for an ordered field together with 
the Archimedean Property cannot imply the Least Upper Bound Property, because 
the latter property is not satisfied by the rational numbers. However, we note that our 
use of the Least Upper Bound Property in the proof of the Archimedean Property 
for the real numbers was necessary, and was not simply a matter of convenience. As 
mentioned in Section 2.6, there exist ordered fields that do not satisfy the Archimedean 
Property, and hence any proof of the Archimedean Property for the real numbers must 
ultimately rely upon some aspect of the real numbers beyond the axiom for an ordered 
field, and the only axiom for the real numbers other than that of an ordered field is the 
Least Upper Bound Property. 


Reflections 


The two theorems referred to in the title of this section, the Extreme Value 
Theorem and the Intermediate Value Theorem, are models of why we study real 
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analysis. Both theorems are reasonably clear intuitively, and yet they are rather 
difficult to prove. It would be nice if every proof of every theorem were not only 
simple, but also provided a clear insight into why the theorem is true. Unfortunately, 
it happens regularly in mathematics that theorems that seem to be intuitively true 
have no known simple or direct proof. Of course, we cannot dispense with uninviting 
proofs when no better ones are available. 

The final result in this section, which shows that the Least Upper Bound Property 
is equivalent to the Extreme Value Theorem, the Intermediate Value Theorem and the 
Heine—Borel Theorem, is an example of what is known in the mathematical world as 
a “folk theorem,” which is a result that everyone knows is true, but the proof of which 
is either not written down anywhere, or is written down somewhere but is not widely 
known. The author has many times over the years told his students in real analysis 
courses that these three theorems (and some other theorems too, as seen at the end of 
Section 8.3) are logically equivalent to the Least Upper Bound Property, but it was 
only during the writing of this text that the author realized that he had never actually 
seen a proof of this equivalence, and that if he wanted to continue making such a 
claim, he would need to figure out a proof, which is the one given in this section. The 
author subsequently found a similar proof in [Olm62, Appendix Sections 1-3]. 


Exercises 


Exercise 3.5.1. 


(1) Find an example of a function f: [0,1] — R such that f is not continuous, but 
that f satisfies the conclusion of the Extreme Value Theorem. 

(2) Find an example of a function f: [0,1] — R such that f is not continuous, but 
that f satisfies the conclusion of the Intermediate Value Theorem. 


Exercise 3.5.2. Let [a,b] C R be a closed bounded interval, let c € (a,b) and let 
f: [a,b] > R be a function. Suppose that f|j,..} and f|\.,») both satisfy the conclusion 
of the Intermediate Value Theorem. Prove that f satisfies the conclusion of the 
Intermediate Value Theorem. 


Exercise 3.5.3. Let [a,b] C R be a closed bounded interval, let k € R and let 
f: [a,b] — R be a function. Suppose that f satisfies the conclusion of the Inter- 
mediate Value Theorem. Prove that kf satisfies the conclusion of the Intermediate 
Value Theorem. 


Exercise 3.5.4. Let [a,b] C R be a closed bounded interval, and let f: [a,b] — [a,b] 
be a function. Suppose that f is continuous. Prove that there is some c € [a,b] such 
that f(c) =c. The number c is called a fixed point of /. 


Exercise 3.5.5. Let [a,b] C R be a closed bounded interval, and let f: [a,b] — R 
be a function. Suppose that f is continuous. Prove that f({a,b]) is a closed bounded 
interval. 


Exercise 3.5.6. [Used in Section 2.6, Example 4.6.3, Section 7.1, Exercise 7.2.11 and 
Example 10.2.8.] In this exercise we use the Intermediate Value Theorem to prove 
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that every positive real number has an n' root, for every n € N. More precisely, let 
x € (0,0), and let n € N. We will So that there is a unique z € (0,°°) such that 
z! = x. The number z is called the n™ root of x, and is denoted ¥/x. The square root 
of x, denoted ,/x, is another name for ~7/x. 

If n = 1 this result is trivial, so we suppose that n > 1. 


(1) Let a,b € (0,00). Suppose that a < b. Prove that a” <b". 
(2) Let a € (1,00). Prove that |<a<a". 

(3) Let a € (0,1). Prove that 0 <a" <1. 

(4) Prove that if ¥/x exists, then it is unique. 

(5) Prove that if x > 1, then </x exists. 

(6) Prove that if 0 <x <1, then <x exists. 


Exercise 3.5.7. Let p: R — R be a polynomial function. Suppose that p has odd 
degree. The purpose of this exercise is to prove that p has a root. This fact is often 
stated in calculus courses, where it is justified by referring to the Intermediate Value 
Theorem, and citing the (usually unproved) fact that “if x gets very large then the 
highest degree term dominates the other terms in the polynomial.” The Intermediate 
Value Theorem (Theorem 3.5.2) is certainly needed here, but we avoid the informal 
fact cited above as follows. 

Suppose that f has the form f(x) = a9 +a,x+---+<a,x" for all x € R, for some 
n € NU {0} and ao,a1,...,dn € R. Suppose that n is odd, and that a, 4 0. 


(1) Prove that there is some c € (0,°°) such that a ple) - += 1] < land oS = |< 
1. 

(2) Use Part (1) of this exercise to prove that there is some r € [—c,c] such that 
f(r) =0. 


Exercise 3.5.8. [Used in Exercise 10.4.12.] Let [a,b] C R be a closed bounded interval, 
let f: [a,b] > R be a function and let r € R. Suppose that f is continuous, and that 
f(a) <r< f(b). Let S = {x € [a,b] | f(x) > r}. Prove that S F 0, that glbS € (a,5] 
and that f(glbS) > r. 

The proof may be simplified as follows. Let g: [a,b] > R be defined by g(x) = 
f(x) —r for all x € [a,b]. Then g(a) <0 < g(b), and S= {x € [a,b] | g(x) = O}, and 
g is continuous. Hence it suffices to prove the desired result for g instead of f, where 
we replace r with 0. [Use Exercise 3.3.4 (2).] 


3.6 Historical Remarks 


From the modern perspective, calculus makes essential use of the concept of the 
limit of a function and the related concept of a continuous function. Historically, 
however, the early development of calculus relied upon the somewhat related notion 
of infinitesimals rather than the later-developed idea of a limit. An infinitesimal is 
an “infinitely small” but positive number; that is, a positive number that is smaller 
than any positive real number. In the standard approach to the real numbers and real 
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analysis that is used today, for example as discussed in this text, infinitesimals do not 
exist (as is proved in Lemma 2.3.10). However, the intuitive notion of infinitesimals 
was crucial to the development of calculus, whether or not any such thing exists. The 
idea of an infinitely small number is related to the ideas of infinitely large numbers 
and infinite processes; hence, our discussion of the history of the concepts of limits 
and continuity commences with a brief mention of the ancient Greek approach to 
the infinite, which was influential for many centuries in Europe. The development of 
calculus, though in part spurred on by ancient Greek successes, is coincident with the 
waning of the influence of the ancient Greek way of doing mathematics. 


Ancient World 


Infinitesimals arose, in a non-mathematical way, in ancient Greek thought via the 
approach of atomists such as Democritus of Abdera (c. 460-c. 370 BCE). The atomist 
approach was formulated as an attempt to resolve the four arguments of Zeno of 
Elea (c. 490-c. 425 BCE), who wanted to show that there is no motion. Aristotle 
(384-322 BCE) rejected infinitely small indivisibles as part of his attempt to refute 
Zeno’s arguments; these arguments are known today via their summary and counter- 
arguments in Book VI of Aristotle’s Physics. Aristotle rejected the infinitely large, 
though he accepted the infinite existing as a potential. One of his arguments against 
the infinitely large was based upon cardinalities of sets, where he assumed that a 
proper subset always has smaller cardinality than the original set, and from that he 
ruled out infinite sets (a correct deduction, but a false assumption). 

Archimedes (287—212 BCE) proved various area and volume results using the 
method of exhaustion, which avoids taking limits or using indivisibles by using a 
double reductio ad absurdum (proof by contradiction). Such proofs give no clue 
as to how these results were first discovered. However, in the Method, which was 
found only in 1899, Archimedes explained his method of discovery, which was 
based upon mechanical ideas. As part of his method of discovery, which he was 
careful to state was no substitute for a proper proof, Archimedes used the idea that 
the region of the plane or space is made up of indivisible sections; because this 
use of indivisibles was stated without explanation, it might have been the case that 
such an informal use of indivisibles for discovery (though not proof) was known to 
Archimedes’ contemporaries. 


Medieval Period 


When explaining his graphical representation of certain functions, Nicole Oresme 
(1323-1382), around 1350, had the view that measurable quantities (excluding whole 
numbers) vary continuously, and he suggested the idea of a mathematical indivisible 
(though he acknowledged that physical indivisibles did not exist). Both of these ideas 
would be used by later mathematicians in the development of calculus. 

Nicholas of Cusa (1401-1464) helped bring infinitesimals and the infinite into 
mathematics. For example, he viewed a circle as a polygon with infinitely many sides, 
and used that to find the area of a circle, an approach later used by Stevin. He did 
not contribute any mathematical work that was important per se, but his approach 
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may well have influenced later mathematicians in their use of the infinite, for example 
Kepler. 

The rise of Platonism (helped by Nicholas of Cusa, among others), and the 
corresponding decline in the influence of Aristotle, allowed for ideas to be established 
by the intellect, rather than being justified solely on the basis of empirical observation. 
This idea that mathematics is independent of empirical observation, or possibly prior 
to it, helped allow for the acceptance of speculative ideas such as the infinite and the 
infinitesimal, as long as the result of such speculation did not lead to problematic 
results. Such an approach in general, and the liberation from Aristotle’s rejection of 
the infinite in particular, was crucial for the development of calculus. 


Renaissance 


Ancient Greek mathematical texts in Latin translation became widely known in 
Europe in the 16th century, and elicited great interest. One such text was On the 
Equilibrium of Planes by Archimedes, which is about centers of gravity. Simon Stevin 
(1548-1620) made a significant step forward in the development of limits in his study 
of centers of gravity in De Beghinselen der Weeghconst of 1586. When approximating 
centers of gravity of curved regions by using inscribed polygons, rather than using 
the cumbersome (though logical) reductio ad absurdum argument of Archimedes, 
which was used to avoid taking limits while approximating regions by polygons and 
polyhedra with ever more sides, Stevin tried to simplify the limit argument by saying 
that if two quantities differ they do so by a finite amount, and therefore in order to 
show that two quantities are equal it suffices to show that they differ by less than any 
finite amount. (This fact, in modern notation, follows from Lemma 2.3.10.) 

In his study of hydrostatics De Beghinselen des Waterwichts of 1586, Stevin used 
what he called “proof by means of numbers,” where he showed that the average 
pressure on a vertical square wall of a vessel full of water corresponds to the pressure 
at the midpoint by working through an example, in which he subdivided the wall into 
n horizontal strips (first n = 4, then 10, then 1000), and he then showed that in each 
case the answer is between 5 = x and 5 + x. He noted that the same idea holds in 
general, and concluded that the difference between the actual pressure and 5 can be 
made smaller than any desired quantity, and so the pressure is 5. This approach is 
similar to a limit, though Stevin did not have the general definition of that concept, 
and it seems that he did not actually believe in infinite processes. He said that he 
preferred the ancient Greek approach, and that his method was only an illustration of 
his results, not a proof. Nonetheless, Stevin’s work helped promote the idea of limits, 
and helped move away from the Archimedean reductio ad absurdum, although the 
subsequent rise of Cavalieri’s use of indivisibles temporarily moved mathematicians 
away from the limit idea. Stevin’s work may have had some influence on Kepler, 
Cavalieri and Grégoire de Saint- Vincent. 

Another person who tried to use some sort of limit idea instead of reductio ad 
absurdum as part of an attempt to solve some center of gravity problems was Luca 
Valerio (1552-1618) in 1604. He was probably unaware of the earlier work of Stevin, 
but had a similar approach in that he did not explicitly think of limits, but rather stated 
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some propositions that allowed him to avoid the details of reductio ad absurdum. 
Without an adequate notion of functions, however, he could not completely succeed. 
Cavalieri and Grégoire de Saint-Vincent were familiar with Valeiro’s work. 

The work of people such as Stevin and Valerio was, in a sense, a continuation of the 
ancient Greek approach, rather than the new approach that was to be developed in the 
17th century. Steven and Valerio, in the course of trying to simplify the Archimedean 
approach, were responding to that approach, as opposed to subsequent work which was 
focused more on discovering new results and developing computational techniques 
than on formulating proofs, and which was less in reference to Archimedes. In the 
19th century there was a move to regain rigor once again, though it was entirely 
different from the ancient Greek approach, and this time was very much based upon 
limits, as people such as Stevin and Valerio wanted to do, though not at all using their 
specific methods. 


Seventeenth Century 


Perhaps the first person to deal explicitly with the type of limit that earlier mathemati- 
cians such as Steven and Valerio, and perhaps Archimedes, had in mind but did not 
state explicitly was Grégoire de Saint-Vincent (1584—1667) in 1647. In contrast to 
those earlier mathematicians, who subdivided regions only as much as was needed 
to get the error less than a given amount, which means that only finite subdivisions 
were used, Grégoire de Saint-Vincent thought of using an actual infinite subdivision, 
which led him close to the idea of the limit of an infinite process, though he was not 
rigorous from a modern perspective. 

In contrast to Grégoire de Saint-Vincent’s informal idea of limits, a more common 
approach in the 17th century as a replacement for the reductio ad absurdum approach 
of Archimedes was the use of infinitesimals or indivisibles. These two notions, though 
used somewhat similarly, are not entirely the same; infinitesimals are infinitely small 
parts of a region but have the same number of dimensions as the whole region, whereas 
indivisibles are one dimension lower. One of the first to use this type of approach for 
finding areas and volumes was Johannes Kepler (1571-1630). In Nova stereometria 
doliorum vinariorum of 1615, Kepler computed volumes of solids of revolution. This 
work, which was meant to be of practical use for finding volumes of wine casks, 
focused on getting results rather than using Archimedean niceties, and in it Kepler 
used infinitesimals freely to obtain his results. For example, Kepler found the area of 
a circle by cutting it up into infinitely many infinitesimally thin triangles meeting at 
the center of the circle, and then rearranging them into a single triangle (the idea of 
viewing the circle as made up of infinitely many triangles was not new, being found, 
for example, in the work of Nicholas of Cusa and Viéte). 

Galileo Galilei (1564—1642) never published a work on indivisibles per se, but 
he used indivisibles and the infinite in his landmark work Discorsi e dimostrazioni 
matematiche, intorno a due nuove scienze of 1638 (often referred to as Two New 
Sciences). He was probably influenced by Kepler. Galileo, who was also influenced 
by the scholastics in his approach to the infinite, warned in Two New Sciences against 
treating the infinite the same way as the finite. 
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Bonaventura Cavalieri (1598-1647), a pupil of Galileo, wrote two important 
works using indivisibles, Geometria indivisibilibus continuorum nova quadam ratione 
promota of 1635 and Exercitationes geometricae sex of 1647. The former, which was 
entirely about the use of indivisibles, and was the first such book, received a lot of 
attention and was widely discussed. Cavalieri took the concept of indivisible, which 
is less than clear intuitively, and made it into a workable tool for finding areas and 
volumes. He viewed planar regions as made up of infinitely many slices by parallel 
lines, and solid regions as made up of infinitely many slices by parallel planes, and 
his main idea was that to compare the areas or volumes of two regions, it suffices to 
compare their slices. Cavalieri’s explanations were not very clear, and he seems to 
have confused indivisibles in the mathematical and physical senses. On the one hand, 
Cavalieri made progress via his use of indivisibles to obtain many geometric results. 
On the other hand, Cavalieri’s approach, while producing new results, harked back 
to the indivisibles of Oresme and other medieval scholars, rather than forward to the 
limit concept that would be developed later. 

In contrast to Aristotle, who had denied the existence of the infinitely small 
though he accepted the idea of the infinitely large as a potential, Blaise Pascal (1623- 
1662) viewed the infinitely small as complementary to the infinitely large, just as the 
reciprocal of a very large number is very small. 

John Wallis (1616-1703), who was the first person to use the symbol “‘co” to 
denote infinity, looked at parallelograms with thickness ace which seems to have 
been both non-zero and zero as needed. He found areas and volumes arithmetically 


rather than geometrically. He showed, using an unproved analogy, what we write 
| / 
as bs x" dx = a for all n € N, and then claimed that the result was true for all 


n © R— {1} by an appeal to “interpolation and induction.” 


Newton and Leibniz 


One of the most remarkable aspects of the invention of calculus is that even though its 
inventors, Isaac Newton (1643-1727) and Gottfried von Leibniz (1646-1716), did not 
have our modern tools for mathematical rigor, they got all the basic ideas right. The 
notion of a function did not yet exist at the time of Newton and Leibniz; they thought 
about curves in the plane from a geometric point of view. The notion of a limit as 
we now know it came even later than the notion of a function, and was certainly not 
available to Newton and Leibniz. Instead of using limits, the originators of calculus 
used infinitesimals. 

The role of infinitesimals in calculus can be seen in the calculation of derivatives. 
Today we calculate derivatives via the limit He Feri fe) If one does not have the 
notion of limit, it would be possible to evaluate the fraction seca by thinking 
of 4 as an infinitesimally small but still positive number. Because h is non-zero we 
can divide by it, but because it is infinitesimally small we can think of it as negligible 
in comparison to any real number. For example, if we let f(x) = x’, we first obtain 


22 2 : : 
(eth) v= oxi th = 2x+h, and we then drop h to obtain 2x. Such calculations, 
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which lead to the right answers but are on shaky foundational grounds, were done 
routinely in the early development of calculus. 

Newton’s initial versions of calculus used infinitesimals, but subsequently his 
approach turned away from them and toward something approaching the idea of a limit. 
Indeed, Newton’s understanding of infinitesimals and limits was rather sophisticated 
given the general state of mathematical development of his era. Newton’s most 
important published work was Philosophie Naturalis Principia Mathematica of 1687 
(often referred to as the Principia). This work set forth Newton’s theory of mechanics 
and gravitation, and is considered one of the most important texts in the history of 
science. The Principia did not use the calculus that Newton had previously developed, 
and phrased everything in terms of classical Euclidean geometry, though it is possible 
to find in the Principia an understanding of Newton’s approach to infinitesimals and 
limits. Some people suggest that Newton worked out everything in the Principia in 
terms of fluxions (as he called his version of derivatives), and then redid it in classical 
terms to avoid controversy; others dispute this conjecture, suggesting that Newton 
ultimately preferred the classical approach and was uncomfortable with infinitesimals, 
and also that calculus was not yet sufficiently developed for Newton’s purposes in the 
Principia. 

Lemma I of Book I of the Principia states 


“Quantities, and the ratio’s of quantities, which in any finite time converge 
continually to equality, and before the end of that time approach nearer the 
one to the other than by any given difference, become ultimately equal.” 


(Quotes from the Principia are from [New].) In this lemma, Newton attempted to 
formulate the notion of a limit, at least in a special case (see Exercise 3.2.13 for a 
modern statement of what Newton appears to be suggesting in this lemma). What 
matters is not the details of this lemma, but Newton’s phrase “nearer the one to the 
other than by any given difference,” which has a strong resemblance to the role of € 
in the modern definition of limits, though Newton did not give a definition of what he 
meant by this sort of limit. 

Newton stated his philosophical approach to limits and infinitesimals in the 
Scholium at the end of Section I of Book I of the Principia, where we find 


“These lemmas are premised, to avoid the tediousness of deducing perplexed 
demonstrations ad absurdum, according to the method of the ancient geome- 
ters. For demonstrations are more contracted by the method of indivisibles: 
But because the hypothesis of indivisibles seems somewhat harsh, and there- 
fore that method is reckoned less geometrical; I chose rather to reduce the 
demonstrations of the following propositions to the first and last sums and 
ratio’s of nascent and evanescent quantities, that is, to the limits of those 
sums and ratio’s; and so to premise, as short as I could, the demonstrations 
of those limits. For hereby the same thing is perform’d as by the method of 
indivisibles; and now those principles being demonstrated, we may use them 
with more safety.” 
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Newton understood that philosophical objections might be raised to the above-quoted 
ideas, and he argued for his view by comparing limits to instantaneous velocity. 

Leibniz based his approach to derivatives on the differentials dx and dy, and he 
appeared to be ambivalent about whether or not to think of dx and dy as infinitesimals. 
He attempted to avoid the question by saying that if one wanted to, one could take dx 
and dy to be real numbers that are as small as desired, and then the errors obtained 
when expressions such as (dx)? were dropped could be made to be within any given 
tolerance, which hints at our modern notion of limits; that one could rework everything 
involving infinitesimals in terms of the method of exhaustion, though he did not 
actually do so; and that infinitesimals are useful in solving problems, which makes 
them worthwhile as a useful fiction. However, even though Leibniz was ambivalent 
about the existence of infinitesimals, he seemed to think that they obeyed certain rules, 
and could be used properly. Leibniz did not develop an approach resembling limits as 
did Newton. 


Eighteenth Century 


In contrast to Leibniz’s ambivalence about infinitesimals, some of his important 
successors, such as Jakob Bernoulli (1654-1705), Johann Bernoulli (1667-1748) and 
Leonhard Euler (1707-1783), had no qualms about the existence of infinitesimals. 

Between the mid-18th century and the first quarter of the 19th century, questions 
arose as to the requirements of functions being continuous or not, and smooth or 
not. In Euler’s influential textbook Jntroductio in analysin infinitorum of 1748, the 
only functions considered were those given by single formulas made up from the 
standard elementary functions by the usual ways of combining functions. However, 
in response to the question of the permissible initial shape of a string allowed in the 
study of vibrating strings via partial differential equations, Euler expanded his notion 
of functions and allowed piecewise smooth ones, though he assumed that functions 
were continuous, except possibly at isolated points, which were ignored. 

Continuity in the 18th century was not just a geometric notion, but a more general 
idea of going through all intermediate states, or of gradual change. In an effort to 
clarify what continuity was, Louis Arbogast (1759-1803), in 1791, focused on the 
inadmissibility of functions that jump abruptly. He invoked the idea of a function 
obtaining all “the successive values” between two values; that is, he hinted at the 
Intermediate Value Theorem. 


Nineteenth Century 


By the early 19th century the previous notion of continuity was challenged. Joseph 
Fourier (1768-1830), in Théorie analytique de la chaleur of 1822, studied what we 
now call Fourier series, and stated that some discontinous and some non-differentiable 
functions should be considered. In 1829 Lejeune Dirichlet (1805-1859) described an 
example of a function that did not satisfy conditions that imply the convergence of 
Fourier series (this function is given in Example 3.3.3 (6)). All this work stressed the 
need for a good definition of continuity. 
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Carl Friedrich Gauss (1777-1855) had a new view of infinitesimals. In the 18th 
century it was held that infinitesimals behaved similarly to real numbers (except that 
the cancellation law did not hold), and hence infinitesimals could be manipulated 
accordingly. Gauss, by contrast, said that caution was needed when using infinite 
quantities, and that they should be used only if their use can be viewed in terms 
of limits. However, Gauss’ approach was not completely rigorous; for example, he 
implicitly (and without proof) used the Monotone Convergence Theorem. 

The first person to give essentially the modern formulation of continuity, phrased in 
terms of f(x+h) — f(x) becoming as small as desired if / is sufficiently small (though 
without the €—6 formulation), was Bernard Bolzano (1781-1848) in 1817. Bolzano’s 
goal was to prove the Intermediate Value Theorem, and he defined continuity along the 
way. His approach was not completely rigorous because he did not have the axioms for 
the real numbers, but it was a significant step forward nonetheless. However, this work, 
which was privately published, was not widely seen by Bolzano’s contemporaries. 

The first person to make rigor in these matters important and to have wide influ- 
ence was Augustin Louis Cauchy (1789-1857), who apparently was not aware of 
Bolzano’s work. Cauchy wrote three important textbooks on analysis, Cours d’analyse 
al’Ecole Royal Polytechnique of 1821, Résumé des legons données a l’Ecole Royal 
Polytechnique sur le calcul infinitésimal of 1823 and Lecons sur le calcul différentiel 
of 1829. These texts were the first to promote rigor as an important goal in real analy- 
sis. Cauchy tried to replace infinitesimals with functions whose limits are zero. His 
definition of continuity, phrased in terms of f(x +h) — f(x) decreasing indefinitely 
with h, is similar to Bolzano’s definition, though slightly less satisfactory. Cauchy 
showed that various elementary functions (for example sine) are continuous, and he 
gave a proof of the Intermediate Value Theorem using sequences. Cauchy was the first 
person to use the symbols ¢€ and 6 in their now familiar roles, though that was in the 
proof of a theorem, and not in his definitions of limits and continuity. Cauchy looked 
at the continuity of a function on an entire interval, not pointwise as we do today. 
Cauchy’s work was a major step forward in the development of the modern approach 
to limits and continuity, and to rigorous proofs in real analysis in general, though it 
was not completely rigorous by modern standards. For example, he still referred to 
infinitesimals as if they were numbers, though no such theory of infinitesimals was 
developed at the time, and he glossed over the difference between continuity and 
uniform continuity. 

Karl Weierstrass (1815-1897) changed the view of limits from the previous notion 
of a “variable approaching” something to a static view where the “x” in “f(x)” is 
a member of what we now call a set. Weierstrass gave what amounts to the e—d 
definition of continuity; he used € as we now use it, though he used an interval in the 
domain rather than 6. It can be said that Weierstrass ended the use of infinitesimals in 
real analysis. Eduard Heine (1821-1881), based upon lectures of Weierstrass, was 
the first person to use the €—6 definition of continuity as we do now (though he used 
7 rather than 6). Additionally, Heine distinguished in 1872 between continuous and 
uniformly continuous functions, and showed that a continuous function on a closed 
bounded interval is uniformly continuous, a result Dirichlet had formulated in 1854 
but did not prove. Heine also published the first proof of the Extreme Value Theorem. 
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Interestingly, whereas the idea of variability had been banished from Greek 
mathematics because it led to Zeno’s paradoxes, it was precisely this concept which, 
revived in the later Middle Ages and represented geometrically, was a factor in the 
invention of calculus in the 17th century. On the other hand, once calculus was 
invented, and an attempt was made in the two centuries following that invention to put 
calculus on a rigorous basis, the idea of variability was once again banished (see the 
paragraph following Definition 2.5.10 for a modern approach to the idea of variables). 


Twentieth Century 


Although infinitesimals were expelled from the rigorous treatment of real analysis by 
the time of Weierstrass, in 1960 Abraham Robinson (1918-1974) constructed a system 
of numbers that included the real numbers as well as infinitesimals and infinitely large 
numbers, and he showed that calculus can in fact be done rigorously via infinitesimals 
after all. Such an approach, which is known as “non-standard analysis,” can be used 
as an alternative treatment of real analysis that completely avoids limits, though it has 
not caught on as a popular replacement for the standard approach to real analysis (as 
found in the present text). 


4 


Differentiation 


4.1 Introduction 


Having now finished all of our technical preliminaries (that is, proofs of the needed 
facts about the real numbers, limits and continuity), we are finally ready for what this 
text is really about, which is an advanced look at the core material taught in a single- 
variable calculus course. It has taken us this long to get to material from calculus 
because limits are at the heart of what makes calculus work, even in theorems that do 
not explicitly mention limits, and to make limits work, we needed the properties of 
the real numbers. 

It is assumed that the reader has seen derivatives in a calculus course, and so we 
will go straight to the technical details, without spending time on intuitive motivation, 
applications or computational examples. 

We need one preliminary technical comment before we commence our study of 
derivatives. In its most basic form, the definition of derivatives is for functions with 
domains that are non-degenerate open intervals in R; such open intervals need not be 
bounded, and can in fact be all of R. Using one-sided limits it is also possible to define 
derivatives on closed intervals, and we will do so when needed, but fundamentally 
derivatives are about open intervals. 


4.2 The Derivative 


Although it had a surprisingly long historical wait until it appeared, the definition of 
the derivative is, from our modern point of view, quite straightforward. Intuitively, 
derivatives are used to resolve two issues: the rate of change of a function, and the 
slope of the tangent line to a curve. The latter issue is in fact just the geometric version 
of the former, and so these two issues are really the same. 

As the reader has certainly learned in a calculus course, the intuitive idea for 
finding the slope of a tangent line to a curve is to find the slopes of secant lines to the 
curve (that is, lines through pairs of distinct points on the curve), and then to take 
the limit of the slopes of the secant lines as the x-values of the points get closer and 
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closer. If we want to make this intuitive idea rigorous, the question arises as to what 
the precise definition of the tangent line to a curve is. Although the geometrical idea 
of a tangent line as the line that “just touches a curve at a given point” is simple to 
grasp intuitively, as seen in Figure 4.2.1 (i), it is not entirely obvious how to translate 
that informal notion of a tangent line into a rigorous definition. Moreover, not every 
curve has a tangent line at every point; see, for example, the origin in Figure 4.2.1 
(ii). In fact, as we will see in Section 10.5, there are continuous functions for which 
no point has a tangent line (the existence of such functions is not at all trivial, and 
the reader should not expect to be able to define such a function easily). Fortunately, 
although the notion of a tangent line is important for the intuitive motivation of the 
notion of the derivative of a function, from a rigorous point of view the material is 
treated in the opposite order; that is, we first define the derivative, via the standard 
definition using limits of quotients, and then the tangent line is simply defined to 
be the unique line through the given point on the curve and with slope equal to the 
derivative at that point. We will not be making formal use of tangent lines in this text. 


(i) (ii) 


Fig. 4.2.1. 


For a rigorous definition of derivatives we need to use limits, and that is why 
the chapter on limits precedes the present chapter. All of the hard work in making 
the definition of derivatives rigorous is contained in the rigorous definition of limits, 
and so our definition of derivatives will be very simple. Indeed, the definition of 
derivatives is one of the rare places in real analysis where we can be rigorous while 
appearing to be doing things exactly as they are done in an introductory calculus 
course. 

Suppose that f: J — R is a function, for some open interval J C R be an open 
interval, and let c € J. To find the slope of the tangent line to the graph at c, if such a 
tangent line exists, we choose x € I — {c}, we compute the slope of the secant line 
through the points (c, f(c)) and (x, f(x)), and we then take the limit of this slope as x 
goes to c. See Figure 4.2.2 for such a secant line and a tangent line. We are therefore 
led to the following definition. 
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Fig. 4.2.2. 


Definition 4.2.1. Let J C R be an open interval, let c € J and let f: 1 R bea 
function. 


1. The function f is differentiable at c if 


km {0-0 


xc x-C 


exists; if this limit exists, it is called the derivative of f at c, and it is denoted 
fo). 

2. The function f is differentiable if it is differentiable at every number in /. If 
f is differentiable, the derivative of f is the function f’: J + R whose value 
at xis f’(x) for all x € J. A 


Observe that f’ is the name of the derivative function. The notation “f’(x)” denotes 
the value of the derivative function at the point x in the domain of f’. In a calculus 
course, where it is common to write “f(x)” incorrectly as the name of the function, it 
is also common to write “f’(x)” incorrectly as the name of the derivative. In this text 
we will maintain the correct distinction between “f’” and “f"(x).” 

In addition to the notation f’ for the derivative of f, there are a number of other 
notations that are used for the derivative, such as om These other notations exist for 
historical reasons, and are useful in some circumstances. For our purposes the notation 
f’ is the most appropriate, and we will use it exclusively. 

The following lemma gives a standard variant formulation of the definition of the 
derivative that is often more computationally convenient than the original definition. 
To see that the limit in this lemma makes sense, suppose that J C R is an open 
interval and that c € J. Then by Lemma 2.3.7 (2) there is some 6 > 0 such that 
(c—6,c+6) CI, and hence c+h €/ for all h € (—6,6). Hence, if f: 1 Risa 
function, then the function G: (—6,6) — {0} — R defined by G(h) = Feta for 
all h € (—5,65) — {0} is well-defined, and we can take the limit of this function as / 
goes to 0. 


Lemma 4.2.2. Let 1 C R be an open interval, let c € I and let f : 1 R be a function. 
Then f is differentiable at c if and only if 
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exists, and if this limit exists it equals f'(c). 


Proof. Suppose that f is differentiable at c. Let F : I— {c} — R be defined by 


for all x € J— {c}. Then lim F(x) exists and equals f’(c). 
c55C 
Let J = {x—c |x € J}, and let g: J— {0} — I— {c} be defined by g(x) =x+c 
for all x € J— {0}. By Exercise 3.2.1 and Exercise 3.2.5 we know that lim g(h) =¢ 
It is straightforward to verify that 


fle+h)- fle) 


(Fog)(h) = 2 


for all h € J — {0}. Because lim F(x) exists, then Theorem 3.2.12 implies that lim (F o 
Xx—C 


Q h— 


g)(h) exists and lim (F og)(h) = lim F(x), which is equivalent to saying that 
| x—C 


lim 
h-0 


f(e+h)-f() 
h 


exists and equals f’(c). 
The other implication is similar, and we omit the details. 


Example 4.2.3. 


(1) Let 7 C R be a non-degenerate open interval, and let f: J — R be defined by 
f(x) = mx +b for all x € J, where m,b € R. Let c € J. By Lemma 2.3.7 (2) there is 
some 6 > 0 such that (c—6,c+6) CI. Thenc+h €/ for all h € (—6,6), and hence 
Heth) fle) is defined for all h € (—6,0) U(0,6). To find the derivative of f at c, we 
use Exercise 3.2.1 to see that 

f(c+h) — f(c) i [m(c+h)+b]—|mc+b] |. mh 


lim —————— = lim = lim = limm=m, 
h—0 h h—0 h h-0 h h—0 


where we can cancel the / because as we take the limit as h goes to zero, the number 
h is never equal to zero (which we need to know, because it is not possible to cancel 
Zero). 

We therefore see that f’(x) exists and equals m for every x € I. We can abbreviate 
this derivative by writing (mx +)’ = m. In particular, we see that (x)! = 1 and 
(c)' =0. 

(2) Let g: R— R be defined by g(x) =x? for all x € R. We first find the derivative 
of g at 3 by computing 


3+h)—2(3 3+h)2 —3? 6h+h2 
iy SO RS) og EM) alk 


= |i h) = 
h—0 h h—0 h h-0 A poo es 
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where the limit is found using Exercise 3.2.1. Hence g’(3) exists, and 9’(3) = 6. We 
now find the derivative in general by letting x € R, and computing 
h) — h)? —x° 2xh +h? 
ee a 
h-0 h h-0 h h-0 


= lim (2x-+h) = 2x. 


Hence g’(x) exists and equals 2x for all x € IR, which we abbreviate by writing 
(= 25. 

(3) Let k: R > R be defined by k(x) = |x| for all x € R. We try to find the 
derivative of k at 0 by computing 


lim KO +h)—kO) _ in |O+h| = [0] _ lin Al 


h=0 h h—0 h h=0 h- 


We saw in Example 3.2.16 that this last limit does not exist. Hence k is not differen- 
tiable at 0. This lack of differentiability corresponds to the “corner” in the graph of 
y= |x| atx=0. 

On the other hand, the function k is differentiable at all x ¢ R— {0}, as the graph 
of y = |x| would suggest. Let x € (0,0). If h € R and h is sufficiently close to zero, 
then x +h > 0. Hence 


k h)—k h| — I 
mn Ket HA) _ ye HALE Bl en fin = Fin = 1: 
h—0 h h—0 h h—0 h h-0h = h-0 


Hence k’(x) exists, and k/(x) = 1. A similar computation shows that if x € (—:,0), 
then k/(x) = —1. 

By combining the above observations with Exercise 3.3.1 (2), we see that a 
function can be continuous everywhere but not differentiable everywhere. .) 


We see from Example 4.2.3 (3) that continuity does not imply differentiability. On 
the other hand, the following theorem states that if a function is differentiable, then 
it must be continuous. In fact, differentiability at a single point implies continuity at 
that point. This theorem, though simple, will be useful throughout this text. 


Theorem 4.2.4. Let I C R be an open interval, and let f: I — R be a function. Let 
c EL. If f is differentiable at c, then f is continuous at c. If f is differentiable, then f 
is continuous. 


Proof. Suppose that f is differentiable at c. Hence 


im LF) 


xc XC 
exists and equals f’(c). If x € 1—{c}, then 


FO)= fe) ( 


X—C 


f(x) = x—c)+f(c). 


We now use Theorem 3.2.10 and Exercise 3.2.1 to deduce that 
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lim f(x) = lim FO FO) (56) + f(6) = f'(c)-0+ f(c) = f(c). 


xc xc x- 


In particular, lim f(x) exists. It now follows from Lemma 3.3.2 that f is continuous 


atc. 


Suppose that a function f is differentiable. Then f must be somewhat “nicely 
behaved,” for example it is continuous by Theorem 4.2.4. How nicely must f’ behave? 
Must f’ also be continuous? And if f’ is continuous, is it necessarily differentiable? 
As seen in the following example, the answer to these last two questions is no. 


Example 4.2.5. Our calculations in both parts of this example will be more informal 
than most of our previous examples, because they require basic formulas for differ- 
entiation (such as the Product Rule and Chain Rule, as well as the derivatives of the 
functions x” and sinx), with which we assume that the reader is informally familiar, 
but which we have not yet proved. The Product Rule and Chain Rule will be proved 
in Section 4.3, the derivative of x” will be computed in Section 7.2 and the derivative 
of sinx will be computed in Section 7.3. Nonetheless, it is nice to see this example 
now, rather than waiting until we have proved all the details. 


(1) Let f: R— R be defined by 


esint, ifx40 
= x“ 
F(x) fi ifx=0. 


See Figure 4.2.3 for the graph of f; the parabolas y = x? and y = —x* are shown with 
dashed lines. 


Fig. 4.2.3. 
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We want to find f’(x) for all x € R. First, let x € R— {0}. Then x € (—c-,0) or 
x € (0,c¢), and in either case by Exercise 4.2.3 (3) we can restrict out attention to 
the appropriate open interval containing x. We can then find f’(x) using the Product 
Rule and Chain Rule (stated formally in Theorem 4.3.1 (4) and Theorem 4.3.3). In 
particular, it is left to the reader to use these rules to verify that 


wd 1 
f(x) = 2xsin a 


2 
cos 
Ke 


Keep in mind that this formula for f’(x) holds only for x 4 0. 
To find f’(0), we need to use the definition of derivatives directly. We compute 


0+h)— f(0 h? sin 4 —0 1 
sa OT) ae i” — limhsin—-, =0, 
h—0 h h—0 h h—0 he 


where the last equality holds by Lemma 3.2.8, because in h = 0, and | sin Zl < 1 for 


all h € R— {0}. 
We have therefore seen that f’(x) exists for all x € R, and that 


y= 2xsin 5 — 2 cos 5, cai 
0, ifx=0. 


We now claim that f’ is not continuous at 0 (though it is continuous elsewhere). By 
Lemma 3.3.2 we need to ask whether or not lim f” (x) exists and lim f" (x) = f’(0). 
x—* x— 


Using our formula for f’(x) when x 4 0, we see that 


1 2 1 
lim f"(x) = lim lassin 5 — = 608 | . 
Proceeding informally, it can be seen that as x goes to 0 from the right, the value 


of 2 goes to infinity, and hence lim, f’ (x) does not exist; a similar argument holds 
x0 


as x goes to 0 from the left. (For a proof that these limits do not exist, we would 
need a rigorous treatment of limits to infinity, which will be given in Chapter 6; the 
details of this proof make use of Exercise 3.2.1, Lemma 3.2.8, Example 6.2.7 (2), 
Theorem 6.2.8, Exercise 6.2.8 and Exercise 3.2.7 (1), though we omit the details.) 
We deduce that f’ is not continuous at 0. We have therefore found an example of a 
function f such that f’ exists everywhere, but that f’ is not continuous. 

(2) Let g: R— R be defined by 


= tee: 


t" ifx>0 

A formula for g’ can be computed very similarly to Part (1) of this example, by using 
the formula for the derivative of x” to compute g’ on each of (—co,0) and (0,c¢), and 
using the definition of the derivative to compute g’(0); the details are left to the reader. 
The result of such a calculation is that 
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2x,  ifx>0 
gi) =< 0, ifx=0 
—2x, ifx<0. 


This formula can be condensed into the single formula g’(x) = 2|x| for all x € R. We 
know by Exercise 3.3.1 (2) and Theorem 3.3.5 (3) that g’ is continuous. However, 
it follows from Example 4.2.3 (3) that g’ is not differentiable; the factor of 2 does 
not affect differentiability, as the reader knows informally, and as will be proved in 
Theorem 4.3.1 (3). % 


Although the derivatives of differentiable functions can be rather poorly behaved, 
as seen in Example 4.2.5, there are some restrictions on how badly behaved they can 
be, as we will see in Example 4.4.11. 

As strange as the functions in Example 4.2.5 are, we will see even more bizarre 
functions later on. In Section 5.2 we will see strange examples of integrable and 
non-integrable functions, and in Section 10.5, our final mathematical section of this 
text, we will see an example of a function R — R that is continuous everywhere 
but differentiable nowhere. The strange functions in Example 4.2.5, and the other 
strange functions we will see subsequently, were constructed to help mathematicians 
better understand the subtleties of the theoretical underpinnings of calculus. (See 
[GO03] for even more examples of functions with unusual and surprising properties.) 
By contrast, most of the standard functions that one encounters in mathematics and 
its applications, such as polynomials, logarithms, exponentials, sine and cosine, are 
not only differentiable, but have continuous derivatives, and in fact we can take the 
derivatives of their derivatives, and the derivatives of those and so on. To state these 
facts precisely, we need the following terminology. 


Definition 4.2.6. Let J C R be an open interval, let c € J and let f: 1— R bea 
function. Suppose that f is differentiable at c. The function f is twice differentiable 
at c if f’ is differentiable at c. If f’ is differentiable at c, the derivative (f’)’(c) is 
called the second derivative of f at c, and it is denoted f”(c). The function f is 
twice differentiable if it is twice differentiable at every number in /. If f is twice 
differentiable, the second derivative of f is the function f”: J + R whose value at x 
is f(x) for all x € 1. 

The n" derivative of f for all n € N is defined as follows, using Definition by 
Recursion. If f is differentiable at c, the first derivative of f at c is simply the 
derivative of f at c. Suppose that f is n — 1 times differentiable at c. The (n — 1)-st 
derivative of f at c is denoted f("~!)(c). The function f is n times differentiable at c 
if f"—) is differentiable at c. If f~!) is differentiable at c, the derivative (f"~")'(c) 
is called the n* derivative of f at c, and it is denoted f”)(c). The function f is n 
times differentiable if it is n times differentiable at every number in /. If f is n times 
differentiable, the n'" derivative of f is the function f”): J + R whose value at x is 
f(x) for all x € I. 

The 0" derivative of f is f = f. A 


We will need the following terminology later on, for example when we discuss 
Taylor series of functions in Section 10.4. 
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Definition 4.2.7. Let J C R be an open interval, and let f: J — R be a function. The 
function f is continuously differentiable if f is differentiable and f’ is continuous. 
Let n € N. The function f is continuously differentiable of order n if f exists and 
is continuous for all i € {1,...,}. The function f is infinitely differentiable (also 
called smooth) if f“ exists all i € N. A 


In general, we take derivatives of functions with domains that are open intervals. 
However, there are some situations in which it is useful to take derivatives of functions 
with domains that are other types of non-degenerate intervals. For example, in both 
versions of the Fundamental Theorem of Calculus (given in Section 5.6), it is necessary 
to consider derivatives of functions with domains that are non-degenerate closed 
bounded intervals. At all points of a non-degenerate interval other than the endpoints, 
we take derivatives as usual, because any interval with its endpoints removed is an 
open interval. At the endpoints of a non-degenerate interval we simply use one-sided 
limits, as defined in Section 3.2, instead of ordinary limits in the definition of the 
derivative. 


Definition 4.2.8. Let 7 C R be a non-degenerate interval, let c € J and let f: 7— R 
be a function. 


1. Suppose that c is a left endpoint of J. The function f is differentiable at c if 


the limit ; 
fim L2V-LO) — tum FEFN-FO 
xact x= Cc h—=0+ h 
exists; if this limit exists, it is called the one-sided derivative of f at c, and it 
is denoted f’(c). 
2. Suppose that c is a right endpoint of J. The function f is differentiable at c if 


the limit ; 

im FOAL) — mn FlE+M =F 

x—c7 x—-C h-0- h 
exists; if this limit exists, it is called the one-sided derivative of f at c, and it 
is denoted f’(c). 

3. The function f is differentiable if the restriction of f to the interior of J is 

differentiable in the usual sense, and if f is differentiable at the endpoints of [ 
in the sense of Parts (1) and (2) of this definition if there are endpoints. A 


Reflections 


In contrast to some of the material in earlier chapters, the basic concepts in the 
present section (with the possible exception of Example 4.2.5) should be familiar to 
anyone who has taken a calculus course. It would be a mistake, however, to deduce 
from this familiarity that the material in the present section, which is the concept of 
the derivative and some basic facts about this concept, is somehow simpler than the 
previous material in this text. Rather, the technical difficulties in the concept of the 
derivative are hidden in the use of limits, which were already dealt with rigorously 
in Chapter 3. In a typical introductory calculus course, the material on limits is dealt 
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with in an intuitive fashion, but once the basic properties of limits are stated, then the 
definition of derivatives in a calculus course is precisely the same as our definition. 
Of course, we now know that our treatment of derivatives is rigorous, because our 
treatment of limits is. 

One fact about differentiable functions that is given more prominence in a real 
analysis course than in a calculus course is Theorem 4.2.4, which says that if a function 
is differentiable then it is continuous. This fact might seem intuitively obvious, and it is 
not particularly useful for the computational aspects of differentiation and integration 
that are stressed in a calculus course, but it is very important for theoretical purposes, 
and makes its way into many proofs throughout this text. 

Something else found in this section that is not usually found in a calculus course 
are the weird examples in Example 4.2.5. An introductory calculus course aims 
to provide students with computational tools that are useful in a broad variety of 
applications, and hence the focus is on taking derivatives of the sorts of functions 
that arise in real-world situations, which are usually nicely behaved functions. In a 
real analysis course, by contrast, the goal is to obtain a better understanding of the 
rigorous foundations of calculus, and we therefore need weird examples, which do 
not necessarily arise in any practical application of calculus, to help us determine the 
range of possible behaviors of functions. Specifically, nice functions tend to have nice 
derivatives, but we want to know if all differentiable functions have nice derivatives, 
and the functions in Example 4.2.5 tell us that the answer to this question is no. We 
will also see some very strange examples of functions in Example 5.2.6 as part of our 
discussion of integrals. 


Exercises 


Exercise 4.2.1. Using only the definition of derivatives and Lemma 4.2.2, find the 
derivative of each of the following functions. 


(1) Let f: R — R be defined by f(x) = 3x—8 forallx ER. 

(2) Let g: R— R be defined by g(x) =x? for all x ER. 

(3) Let h: (0,cc)  R be defined by A(x) = + for all x € (0,00). 
(4) Let k: (0,cc) — R be defined by k(x) = \/x for all x € (0,9). 


Exercise 4.2.2. Let f: R — R be defined by 


_ jx, ifxeQ 
r= {2 ifxeR-Q 


Using only the definition of derivatives and Lemma 4.2.2, determine whether f is 
differentiable at 0. If it is, find f’(0); if it is not, show why not. 


Exercise 4.2.3. [Used throughout.] Let J C R be a non-degenerate interval, let c € I 
and let f: J — R be a function. 


(1) Suppose that c is in the interior of 7. Prove that f is differentiable at c if and 
only if there is some 6 > 0 such that f|/-(c_s,-+6) is differentiable at c. 
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(2) State and prove the analog of Part (1) of this exercise when c is an endpoint of 
I. (There are two cases, depending upon whether c is a right endpoint or a left 
endpoint, and it is sufficient to do only one of the cases.) 

(3) Let J C R be a non-degenerate interval, and let g: J — R be a function. 
Suppose that there is some open interval U CIM J such that c € U, and that 
f(x) = g(x) for all x € U. Prove that g is differentiable at c if and only if f is 
differentiable at c, and if they are differentiable at c then g’(c) = f’(c). 

(4) Let J CR be a non-degenerate interval, and let h: J — R be a function. 
Suppose that c is an endpoint of J, and that there is some half-open interval 
D CIQJ such that c is the endpoint of D, and that f(x) = h(x) for all x € D. 
Prove that if f is differentiable at c then h is differentiable at c, and if they 
are differentiable at c then h'(c) = f’(c). Find an example to show that this 
result cannot be made into an if and only if statement. (There are two cases, 
depending upon whether c is a right endpoint or a left endpoint, and it is 
sufficient to do only one of the cases.) 

(5) Let K C7 be a non-degenerate interval. Prove that if f is differentiable, then 
f\x is differentiable, and (f|x)'(x) = f’(x) for all x € K. 


Exercise 4.2.4. Let J C R be an open interval, let c € J and let f: J — R be a function. 
Suppose that | f(x)| < (x—c)? for all x € I. Prove that f is differentiable at c and 
f'(c) =0. (The function f in Example 4.2.5 (1) is a special case of this exercise, 
where c = 0.) 


Exercise 4.2.5. Let J C R be an open interval, let c € J and let f,g: J — R be 
functions. Suppose that f(c) = g(c), and that f(x) < g(x) for all x € J. Prove that if f 
and g are differentiable at c then f’(c) = g’(c). This result might seem counterintuitive 
at first, but a sketch shows that to the left of c the secant lines of f through c appear 
to have larger slope than those of g, whereas to the right of c the secant lines of f 
through c appear to have smaller slope. 


Exercise 4.2.6. Let J C R be an open interval, let c € J and let f: J — R be a function. 
Prove that f is differentiable at c if and only if there is some D € R such that 


kim £0) ~f() -D@-0) 


xc x-—C 


=0, 


and that if there is such a number D, then D = f"(c). 


Exercise 4.2.7. Let J C R be an open interval, let c € J and let f: J R be a function. 
The function f is symmetrically differentiable at c if 


fle+h)— fle—h) 


h—0 2h 


exists; if this limit exists, it is called the symmetric derivative of f at c. 


(1) Prove that if f is differentiable at c, then it is symmetrically differentiable at c, 
and the symmetric derivative of f at c equals the derivative of f atc. 
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(2) If f is symmetrically differentiable at c, is it necessarily differentiable at c? 
Give a proof or a counterexample. 


Exercise 4.2.8. Let [a,b] C R be a non-degenerate closed bounded interval, let 
c € (a,b) and let f: [a,b] — R be a function. Prove that f is differentiable at c if 
and only if f [a,c] and f I[c,b] are both differentiable at c (as one-sided derivatives) and 
(fliac})'(€) = Fljea})'(e), and if f is differentiable at c then f"(c) = (fljaq)/(¢) = 
(Fico) (c). 

Exercise 4.2.9. Let f: (0,00) — R be a function. Suppose that f(*) = f(x) — f(y) 
for all x,y € (0,00), and that f(1) = 0. 


(1) Prove that f is continuous on (0,°°) if and only if f is continuous at 1. 
(2) Prove that f is differentiable on (0,0) if and only if f is differentiable at 1. 


(3) Prove that if f is differentiable at 1, then f’(x) = a0) for all x € (0,00). (It 


turns out that if f’(1) = 1, then f equals the natural logarithm function, though 
this fact is not needed for this exercise.) 
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Defining derivatives is one thing, computing them is another. Although in principle 
the definition of derivatives applies to all functions, in practice computing derivatives 
of all but the simplest functions using only the definition would be so cumbersome and 
time-consuming that it would be impossible in practice to calculate derivatives when 
they are needed for applications. We now state and prove the standard rules for com- 
puting derivatives. These rules, combined with a knowledge of the derivatives of the 
elementary functions (that is, polynomials, power functions, logarithms, exponentials 
and trigonometric functions), allow us to take the derivative of virtually any function 
that can be built up out of elementary functions using sums, differences, products, 
quotients and compositions. We will see a rigorous treatment of the derivatives of 
the elementary functions in Chapter 7, though for now it is assumed that the reader 
knows these derivatives from a calculus course, and we will use such derivatives in 
examples. 

We start with the following theorem, which shows how derivatives work with 
respect to the addition, subtraction, multiplication and division of functions. Whereas 
from a computational point of view what is important about this theorem is what the 
derivative of a sum, difference, product or quotient actually equals, from a theoretical 
point of view what is important is that the sum, difference, product or quotient of 
two differentiable functions is itself differentiable (with the usual caveat about not 
dividing by zero). 


Theorem 4.3.1. Let 1 C R be an open interval, let c € I, let f,g: I R be functions 
and let k € R. Suppose that f and g are differentiable at c. 

1. f +g is differentiable at c and [f + g|'(c) = f'(c) +g'(c). 

2. f —g is differentiable at c and |f — g|'(c) = f'(c) —g'(c). 
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3. kf is differentiable at c and {kf \'(c) =kf'(c). 
4. (Product Rule) f g is differentiable at c and |f g\'(c) = f'(c)g(c) +f (c)g'(c). 
5. (Quotient Rule) If g(c) 4 0, then f is differentiable at c and 


f]' 7 — fa) —f8'(©) 
=| (c)= 7 . 
ts [g(c)] 

Proof. We will prove Parts (1), (4) and (5), leaving the rest to the reader in Exer- 
cise 4.3.1. 


The following proofs make use of the definition of derivatives, the definition of 
sums, differences, products and quotients of functions (given in Definition 3.2.9), and 
Theorem 3.2.10, Theorem 3.3.8 and Theorem 4.2.4. In all parts of the proof, we show 
that the derivative exists and prove the formula for the derivative simultaneously. 


(1) We compute 


[ft+sl(c+h) —lf +8l(c) 


hm h 
oe [fle+h) +a(c+h)] —[f(c)+a(c)] 
h-0 h 
=p [fle +h) — f(c)] +[g(c+h) —a(c)] 
h-0 h 
x, [fete —fle) , ele+h) —gle) 
at { h 7 h \ 
. Jer —fle) « Seth —slc) — 4 ' 
Tee h 0 h = Fle) +a) 


Hence [f + g]’(c) exists and equals f’(c) + g'(c). 
(4) We compute 


fam (Fale +4) — [fal(c) 


h—0 h 
sim fet MRle+ 4) ~FCe(6) 
7 h-0 h 
— simm fet Male+h)~ Flc+he(o) +s (c+ hale) Fle) 
h-0 h 


=m {rer net =8O , Her= 19 40) 
tim le + 8AN=8O) 5 hy LEM =F) 
= fle! () + Fale): 


Hence [fg]'(c) exists and equals f’(c)g(c) + f(c)g’(c). 
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(5) We compute 


a ji 

= fn HE jy Met (c)— f(e)g(e +h) 
wd hd igle)g(e +h) 

tan Let aCe) — fleet) + Fle)a(e) —Fle)gle+h) 
h—0 hg(c)g(c +h) 

_ 4 [fet -fO g(c+h) —g(c) 1 

6 OO | rane 
Fleje(c) — Fle)z’(e) 


/ 
Hence [Z| (c) exists and equals 


f'e)gle)—F(e)8'(e) 
Is(c)? : 
The following result is an immediate consequence of Theorem 4.3.1, and we omit 
the proof. 


Corollary 4.3.2. Let 1 CR be an open interval, let f ,g: I — R be functions and let 
k ER. If f and g are differentiable, then f +g, f — g, kf and fg are differentiable, 
and if g(x) 4 0 for all x € I then f is differentiable. 


As an application of Theorem 4.3.1, the reader is asked in Exercise 4.3.5 to prove 
the formula for the derivative of f: IR > R defined by f(x) = x” for all x € R, where 
n €N. We will see the analogous formula for all powers of x in Section 7.2, after we 
have defined such power functions rigorously. 

An even more fundamental way of combining two functions than sums, differ- 
ences, products and quotients is composition, and the following theorem shows how 
to compute the derivatives of compositions of functions. 


Theorem 4.3.3 (Chain Rule). Let J,J C R be open intervals, let c € I and let f: I 
J and g: J—R be functions. Suppose that f is differentiable at c, and that g is 
differentiable at f(c). Then go f is differentiable at c and |go f|'(c) = g'(f(c))-f’(c). 


Before we give a proof of the Chain Rule, we want to present an attempted proof 
of this theorem that takes the most straightforward possible approach, though in this 
case the straightforward approach has a flaw. The attempted proof is 


stim BAO — (8° _ 5, 8(F@)) -8F) 


x¢ XAG ae X—C 
— in SLO =8(MO) FO) = FO 
we f@—fl xe 
=8(F())- fe)” 
Before reading a valid proof of the Chain Rule, the reader should try to find the flaw in 
this attempted proof, to understand why we need to take the less than straightforward 
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approach used in the proof given below; the reader is asked to explain the flaw in 
Exercise 4.3.6. 


Proof of Theorem 4.3.3 (Chain Rule). Let k: J — R be defined by 


Because g is differentiable at f(c), we know that 


km 82) ~8Fe)) 
yf) =y—fle) 


exists and equals g’(f(c)). It therefore follows from Lemma 3.3.2 that k is continuous 
at f(c). 

Because f is differentiable at c, we know by Theorem 4.2.4 that f is continuous 
at c. Theorem 3.3.8 (2) then implies that ko f is continuous at c. It follows from 
Lemma 3.3.2 that lim (ko f)(x) exists and equals (ko f)(c). 


By the definition of k we see that if y € J— {f(c)} then 


k(y)ly— f(e)] = a(y) -a(f(e))- (4.3.1) 


Equation 4.3.1 also holds when y = f(c), which is seen by simply substituting y = f(c) 
into both sides of the equation. Hence Equation 4.3.1 holds for all y € J. 

Let x € J— {c}. Then f(x) € J, and so we can substitute y = f(x) into Equa- 
tion 4.3.1 to obtain 


Dividing both sides of this last equation by x — c (which is not zero) yields 


(eo py f=LO _ oN) =A) 


X—C X—C 


Finally, using the continuity of ko f at c, Theorem 3.2.10 (4), the definition of k 
and the fact that f is differentiable at c, we see that 


fim EPMO) _ tin [go py OLE 


= (ko f)(c) Fc) =K(F(6)) -F'(0) =8'(F(C)) -f'(O)- 
Hence [go f]'(c) exists and equals g’(f(c))- f’(c). 


The following result is an immediate consequence of the Chain Rule (Theo- 
rem 4.3.3). 


Corollary 4.3.4. Let I,J CR be open intervals, and let f: I— J and g: J— R be 
functions. If f and g are differentiable, then go f is differentiable. 
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As we mentioned at the end of Section 4.2, it is possible to take derivatives 
of functions with domains that are non-degenerate intervals, where we take one- 
sided derivatives at the endpoints if there are any. Everything we have proved about 
derivatives in this section for functions defined on open intervals also holds for 
functions defined on other types of non-degenerate intervals, as long as we take 
one-sided derivatives at the endpoints. 

There is a very important theoretical implication of the rules of differentiation 
stated in Theorem 4.3.1 and Theorem 4.3.3. Because it is known how to take the 
derivatives of the various standard elementary functions (polynomials, power func- 
tions, logarithms, exponentials and trigonometric functions), a fact that is familiar 
to the reader from a calculus course and that is treated rigorously in Chapter 7, it 
then follows from the rules of differentiation that any function that is made up of 
elementary functions combined via addition, subtraction, multiplication, division and 
composition is also differentiable. Most of the functions that arise in the applications 
of mathematics to real-world problems are such functions (or at worst are piecewise 
such functions), and therefore most of the functions found in applications are differ- 
entiable (or piecewise differentiable), and their derivatives are themselves made up 
of elementary functions that are combined via addition, subtraction, multiplication, 
division and composition. This remarkable fact for derivatives is in contrast to the 
situation for integrals, where it is not possible to take every function that is made up 
of elementary functions combined via addition, subtraction, multiplication, division 
and composition, and express its indefinite integral as such a function; and even when 
it is possible to express the indefinite integral as such a function in principle, it is 
not always clear how to do so in practice. See the references given in the paragraph 
following Corollary 5.6.3 for details. 


Reflections 


The theorems in this section are very familiar to anyone who has taken a calculus 
course—too familiar, perhaps, because in some introductory calculus courses too 
much emphasis is placed upon the computing of derivatives at the result of rushing 
as quickly as possible to the Product Rule, Quotient Rule and Chain Rule, and not 
enough emphasis is placed upon what the derivative means intuitively, and how it is 
to be applied. We have therefore separated Section 4.2, which is concerned with the 
definition and basic properties of derivatives, from the present section, which is about 
formulas for computing derivatives, in order to emphasize that understanding how 
derivatives are defined is quite separate from knowing how they are to be computed 
in practice. 

Of course, just as it is problematic when a calculus course neglects the intuitive 
meaning of derivatives in favor of too rapidly focusing upon how to compute them, 
it is no less problematic if a calculus course, in its attempt to include intuition 
and application, neglects basic computational skills. A balance is needed between 
computing, intuition and application, and, fortunately, some recent calculus texts have 
attempted to find such a balance. In a real analysis course, by contrast, none of these 
three aspects of differentiation is central, and the focus is purely theoretical. Even a 
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very familiar theorem such as Theorem 4.3.1, which in a calculus course is viewed as 
being important for computational purposes, will be seen to be useful in a number of 
proofs in this text. 

Not only should the statement of Theorem 4.3.1 be familiar to the reader from 
calculus courses, but the proof of this theorem is one of the few proofs in this text that 
the reader might have already encountered, virtually identically, in a calculus course. 
All of the hard work for the proof of Theorem 4.3.1 was done in our treatment of limits 
in Section 3.2, and once the properties of limits have been rigorously established, then 
the proof of Theorem 4.3.1 found in a calculus course is now seen to be rigorous as 
well. By contrast, the proof of the Chain Rule (Theorem 4.3.3) is a bit trickier than 
might be expected, and it is not the same as the informal sketch of a proof that would 
typically be seen in a calculus course. 

Though we have discussed the most common differentiation rules in the present 
section, there is one method for finding derivatives that is taught in introductory calcu- 
lus courses that we do not discuss in the present text, namely, implicit differentiation. 
There are two reasons that we do not deal with this topic. First, implicit differentiation 
is not really a separate method of differentiation, but rather it is simply an application 
of the Chain Rule. It is unfortunate that implicit differentiation is sometimes taught as 
a rote method where one simply inserts the symbol > in certain places because “that 
is how it is done,” rather than emphasizing the role of the Chain Rule. Second, to give 
a completely rigorous treatment of implicit differentiation, it would be necessary to 
prove the Implicit Function Theorem, which says that under the right circumstances 
an equation of the form F (x,y) = 0 can be viewed as locally describing y as a function 
of x, the right circumstances being essentially that the curve described by F (x,y) =0 
is smooth and the tangent line is not vertical at the given point on the curve. To do that 
rigorously, however, would require functions of two variables and partial derivatives, 
which we do not treat in this text. 


Exercises 


Exercise 4.3.1. [Used in Theorem 4.3.1.] Prove Theorem 4.3.1 (2) (3). 


Exercise 4.3.2. Let J C R be an open interval, let c € J, letn € N and let f|,...,f,: 
I — R be functions. Suppose that f; is differentiable at c for all i € {1,...,n}. Prove 
that f1f2--- fy, is differentiable at c, and find (and prove) a formula for (f\ f2--+ fn)’ (c) 


in terms of fi(c),...,fn(c) and fi(c),...,fA(c). 


Exercise 4.3.3. Let J C R be an open interval, let c € J and let f: 1— R bea 
function. Suppose that f is differentiable at c. Let f* = f - f. Using only the definition 
of derivatives and Lemma 4.2.2, prove that [f7]'(c) = 2f(c)f’(c). Do not use any 
other theorems about differentiation, such as Theorem 4.3.1 or Theorem 4.3.3. 


Exercise 4.3.4. Let J C R be an open interval, let c € J and let f: J — R be a function. 
Suppose that f is differentiable at c, and that f’(c) 4 0. Using only the definition of 
derivatives and Lemma 4.2.2, prove that [F]'() = — FS. Do not use any other 
theorems about differentiation, such as Theorem 4.3.1. 


198 4 Differentiation 


Exercise 4.3.5. [Used throughout.] Let J C R be a non-degenerate open interval, 
and let n € N. Let f: I — R be defined by f(x) =x" for all x € J. Prove that f is 
differentiable and f(x) = nx"! for all x € J. (Strictly speaking, when n = | the 
expression x"—! is not defined for x = 0, but for convenience we abuse notation and 
think of the function g: R — R defined by “g(x) = x° for all x € R” to be the same 
as the function defined by g(x) = 1 for all x € R.) 


Exercise 4.3.6. [Used in Section 4.3.] Explain the flaw in the attempted proof of the 
Chain Rule (Theorem 4.3.3) that is given prior to the correct proof. Restate the Chain 
Rule with modified hypotheses that would make the attempted proof into a valid 
proof. 


Exercise 4.3.7. [Used in Lemma 7.3.4 and Theorem 7.3.12.] Let [a,b] C R and [b,c] C 
R be non-degenerate closed bounded intervals, and let f: [a,b] + Rand g: [b,c] +R 
be functions. Suppose that f(b) = g(b). Let h: [a,c] — R be defined by 


_ J f(x), ifx € [a,b] 
a ee if x € [b,c]. 


(1) Suppose that f and g are differentiable, and that f’(b) = g’(b), where f’(b) 
and g’(b) are one-sided derivatives. Prove that h is differentiable. 

(2) Suppose that b—a=c—b, that g(x) = f(a+c-—x) for all x € [b,c], and that 
f'(b) =0, where f’(b) is a one-sided derivative. Prove that h is differentiable, 
and that h' (x) = —f’(a+c—x) for all x € [b,c]. 


4.4 The Mean Value Theorem 


We take the derivative of a function to learn more about the function. For example, 
the reader is familiar with the fact that a positive derivative means the function is 
increasing. We will see a proof of this fact in Section 4.5. In order to give a rigorous 
treatment of such theorems, we first need to prove a very important tool that relates 
functions to their derivatives, which is the Mean Value Theorem (Theorem 4.4.4). As 
is the case with many important theorems in real analysis, the Mean Value Theorem 
relies upon the Least Upper Bound Property of the real numbers. 

We start with the following two lemmas, which are really the essence of the Mean 
Value Theorem. The first of our lemmas gives a rigorous statement of the intuitively 
evident fact that if a differentiable function has a maximum value or a minimum value 
at a point, then the derivative must be zero at that point. (We use the terms “maximum 
value” and “minimum value” informally here, and we will not need these terms in the 
statements of lemmas and theorems.) 


Lemma 4.4.1. Let [a,b] C R be a non-degenerate closed bounded interval, let c € 
(a,b) and let f: {a,b] — R be a function. Suppose that f is differentiable at c. If 
either f(c) > f(x) for all x € [a,b] or f(c) < f(x) for all x € [a,b], then f'(c) =0. 
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Proof. Suppose that f(c) > f(x) for all x € [a,b]; the other case is similar, and we 
omit the details. 
(x)-f(¢) 


Because f is differentiable at c, then lim f exists and equals f’(c). It 


x—-C 
follows from Lemma 3.2.17 that lim Pa) fle) and lim Fa) fle) exist and are equal 
x>C~ i x>C ° 


to f’(c). 
Let x € (a,c). Because f(c) > f(x), then fa) fle > 0. Therefore, using the 


analog for one-sided limits of Exercise 3.2.12 (1), we deduce that 


km £¢)-L) 


xc KS 


>0. 


It follows that f’(c) > 0. A similar argument shows that 


and hence that f’(c) <0. We conclude that f’(c) = 0. 


The following example shows that Lemma 4.4.1 cannot be made into an if and 
only if statement. 


Example 4.4.2. Let f: [—1,1] — R be defined by f(x) =x? for all x € [—1, 1]. It 
can be verified using the definition of derivatives that f’(0) = 0; the details are left 
to the reader. On the other hand, it is certainly not the case that f(0) > f(x) for all 
x € [1,1], or that f(0) < f(x) for all x € [—1, 1]. 


The Extreme Value Theorem (Theorem 3.5.1) states that every continuous function 
defined on a non-degenerate closed bounded interval has both a maximum value and 
a minimum value. If such a function is differentiable, must these values (or at least 
one of them) occur where the derivative is zero? Lemma 4.4.1 might appear to imply 
that the answer is yes, but consider the graph in Figure 4.4.1 (i), where we see that 
the maximum value and minimum value of a differentiable function with domain that 
is a non-degenerate closed bounded interval can both occur at the endpoints of the 
interval, which would not allow us to apply Lemma 4.4.1 to deduce the existence 
of a point in the interior of the interval with zero derivative. As is formalized in our 
next lemma, this problem can be avoided if the function has equal values at the two 
endpoints of the closed interval; see Figure 4.4.1 (ii). This lemma, known as Rolle’s 
Theorem, is actually a special case of the Mean Value Theorem, but it is easier to 
prove directly than the Mean Value Theorem, and it will be used in the proof of the 
latter. (In spite of its historical name, we call Rolle’s Theorem a lemma because its 
main role is to prove other more important results, such as the Mean Value Theorem 
and Taylor’s Theorem (Theorem 4.4.6).) 


Lemma 4.4.3 (Rolle’s Theorem). Let [a,b] C R be a non-degenerate closed bounded 
interval, and let f: [a,b] > R be a function. Suppose that f is continuous on |a,b} 
and differentiable on (a,b). If f(a) = f(b), then there is some c € (a,b) such that 
f(o) =0. 
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(i) (i1) 


Fig. 4.4.1. 


Proof. Suppose that f(a) = f(b). By the Extreme Value Theorem (Theorem 3.5.1) 
there are X;nin,Xmax € [a,b] such that f(%min) < f(x) < f(%max) for all x € [a,b]. If 
it were the case that f(Xmin) = f(Xmax), then f would be a constant function, and 
therefore by Example 4.2.3 (1) it would follow that f’(x) = 0 for all x € (a,b), and 
so we could let c be anything in (a,b). Now suppose that f(xmin) < f(Xmax)- It 
must therefore be the case that at least one of f(Xmin) and f(xmax) does not equal 
f(a) = f(b). Without loss of generality, assume that f (Xmax) 4 f(a) = f(b); the other 
case is similar, and we omit the details. Let c = Xmqx. Clearly c € (a,b). Hence f is 
differentiable at c, and it now follows from Lemma 4.4.1 that f’(c) = 0. 


The Mean Value Theorem is a generalization of Rolle’s Theorem (Lemma 4.4.3) 
to the situation where f(a) is not necessarily equal to f(b). As is often the case in 
mathematics, generalizing a theorem successfully depends upon changing the way one 
views the theorem. If the goal of Rolle’s Theorem is to find a point c € (a,b) such that 
f'(c) = 0, then as we saw in Figure 4.4.1 (i) it is not possible to drop the requirement 
that f(a) = f(b). However, we observe that when f(a) = f(b), the line through the 
points (a, f(a)) and (b, f(b)) has slope zero, and hence that line is parallel to the 
tangent line at the point c € (a,b) such that f’(c) = 0. The Mean Value Theorem says 
that even when f(a) is not necessarily equal to f(b), there is nonetheless always a 
point c € (a,b) such that the tangent line at c is parallel to the line through the points 
(a, f(a)) and (b, f(b)). See Figure 4.4.2. In addition to the above geometric way of 
thinking about the Mean Value Theorem, another intuitive way of thinking about this 
theorem is that if a car is driven on a straight road from time t = a to t = J, then at 
some point during the trip the instantaneous velocity of the car will equal its average 
velocity for the duration of the trip. 


Theorem 4.4.4 (Mean Value Theorem). Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f : [a,b] — R be a function. Suppose that f is continuous 
on |a,b\ and differentiable on (a,b). Then there is some c € (a,b) such that 


fb) f(a) 


fo- = 
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Fig. 4.4.2. 


Rather than proving the Mean Value Theorem directly, we will first prove a 
generalization of it known as Cauchy’s Mean Value Theorem, and we will then 
deduce the Mean Value Theorem as a corollary of this generalization (we will need 
the generalization later on in any case, and so it is more efficient to proceed as we are 
doing). 


Theorem 4.4.5 (Cauchy’s Mean Value Theorem). Let [a,b] C R be a non-degener- 
ate closed bounded interval, and let f ,g: [a,b] = R be functions. Suppose that f and 
g are continuous on |a,b] and differentiable on (a,b). Then there is some c € (a,b) 
such that 


[f(2) — f@)ls'(c) = [s(6) -s@lf'). 
Proof. Let h: [a,b] > R be defined by 


for all x € [a,b]. We know that constant functions are continuous on [a,b] by Ex- 
ample 3.3.3 (1), and differentiable on (a,b) by Example 4.2.3 (1). We also know 
that f and g are continuous on [a,b] and differentiable on (a,b), and it then follows 
from Theorem 3.3.5 and Theorem 4.3.1 that 4 is continuous on [a,b] and differen- 
tiable on (a,b). Moreover, it is seen that h(a) = f(b) g(a) — g(b) f(a) =h(b). We can 
therefore apply Rolle’s Theorem (Lemma 4.4.3) to h, and we deduce that there is 
some c € (a,b) such that h'(c) = 0. Applying Theorem 4.3.1 to the definition of h 
we see that h' (x) = [f(b) — f(a)]g’(x) — [g(b) — g(a) f'(x). The equation h’(c) = 0 
therefore yields [f(b) — f(a)|g'(c) — [g(b) — g(a) | f’(c) = 0, which is what we needed 
to show. 


Observe that Cauchy’s Mean Value Theorem does not follow directly from the 
Mean Value Theorem, because if we tried to find a “c” for each of the functions f and 
g individually, we would obtain one number, say d € (a,b), for the function f, and 
another number, say e € (a,b), for the function g, where d and ¢ are not necessarily 
equal, and then we would deduce that [ f(b) — f(a)]g’(e) = [g(b) — g(a)|f’(d), which 
is not as nice as Cauchy’s Mean Value Theorem, which says that you can get one 
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“c € (a,b)” that works for the two functions f and g together (though this c would not 
necessarily work for either f or g alone). 


Proof of Theorem 4.4.4 (Mean Value Theorem). The Mean Value Theorem is the 
special case of Cauchy’s Mean Value Theorem (Theorem 4.4.5) where the function g 
is defined by g(x) = x for all x € [a,D]. 


Similarly to the Extreme Value Theorem (Theorem 3.5.1) and the Intermediate 
Value Theorem (Theorem 3.5.2), we note that Rolle’s Theorem (Lemma 4.4.3), the 
Mean Value Theorem (Theorem 4.4.4) and Cauchy’s Mean Value Theorem (Theo- 
rem 4.4.5) are existence theorems, in that they each guarantee the existence of a 
certain number c, without giving any information about how to find c, nor about 
whether c is unique. Also, as the reader is asked to show in Exercise 4.4.10, these 
three theorems are equivalent to the Least Upper Bound Property. 

We now turn to a very useful extension of the Mean Value Theorem known as 
Taylor’s Theorem; we use the version of the theorem due to Joseph-Louis Lagrange 
(1736-1813). Taylor’s Theorem is useful for dealing with Taylor polynomials and 
Taylor series, as we will see in Section 10.4, but the proper way to view Taylor’s 
Theorem is as a generalized version of the Mean Value Theorem that uses higher 
derivatives. As the reader can verify, the Mean Value Theorem is a special case of 
Taylor’s Theorem, obtained by substituting n = 0, and c = a, and x = b in the latter. 

Recall from Definition 4.2.6 that f (0) — f for any function f. 


Theorem 4.4.6 (Taylor’s Theorem). Let [a,b] C R be a non-degenerate closed 
bounded interval, let c € (a,b), let f : [a,b] > R be a function and let n © NU {0}. 
Suppose that f\*) exists and is continuous on {a,b| for each k € {0,...,n}, and that 
f+) exists on (a,b). Let x € [a,b]. Then there is some p strictly between x and c 
(except that p = c when x = c) such that 


n_ ¢(k) Cc 
pr yi _ ie c+ 


FOr) (x et. 


FC) (n+1)! 


k=0 
Proof. First, suppose that x = c. Let p = c. Then the theorem holds in this case, as 
the reader may verify. 
Now suppose that x 4 c. Then there is a unique B € R such that the following 
equation holds (simply solve for B): 


n ( ) 
fas), ae. (x—c) + B(x—c)"*!. (4.4.1) 
k=0 : 


To prove the theorem, we will show that there is some p strictly between x and c such 


that nae 
FP) 
B = (n+l)! ° (4.4.2) 
Let F: [a,b] — R be defined by 
n_ ¢(k) 
F(j=). f — (x—z)F+B(x—z)l (4.4.3) 
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for all z € [a,b]. Because f) exists and is continuous on [a,b] and differentiable on 
(a,b) for each k € {0,...,n}, it follows from standard rules for differentiation (found 
in Example 4.2.3 (1), Theorem 4.3.1, Theorem 4.3.3 and Exercise 4.3.5) that F is 
continuous on [a,b] and differentiable on (a,b). Because the closed interval from x to 
c is contained in [a,b], it follows from Exercise 3.3.2 (2) and Exercise 4.2.3 (5) that F 
is continuous on this closed interval, and differentiable on the open interval from x to 
C. 

It follows from Equation 4.4.1 and Equation 4.4.3 that F(c) = f(x), and from 
Equation 4.4.3 alone that F(x) = f(x). We can therefore apply Rolle’s Theorem 
(Lemma 4.4.3) to F on the closed interval from x to c, and we deduce that there 
is some p strictly between x and c such that F’(p) = 0. Using the Product Rule 
(Theorem 4.3.1 (4)) and some algebraic manipulation, it is left to the reader to verify 


that feet) 
F( =e Bent Ne" 


for all z € (a,b). The fact that F’(p) = 0 can then be rewritten as 


fr) (p) 


n! 


O0O= 


(x—p)"—B(n+ I)(x—p)", 


which implies Equation 4.4.2. 


It is important to observe that the number p in Taylor’s Theorem (Theorem 4.4.6) 
depends upon x; the formula for f(x) in the theorem is for a single value of x € [a,b], 
and is not a general formula for all x € [a,b] with the same p. 

We conclude this section with an application of the Mean Value Theorem to 
antiderivatives, a concept we will define after the following lemma. Intuitively, the 
second part of this lemma states the rather obvious fact that if two horses in a horse 
race run at the same speed, then they will maintain a constant distance between them. 


Lemma 4.4.7. Let I C R be anon-degenerate interval, and let f,g: I — R be function. 
Suppose that f and g are continuous on I and differentiable on the interior of I. 


1. f'(x) =0 for all x in the interior of I if and only if f is constant on I. 
2. f'(x) = g'(x) for all x in the interior of I if and only if there is some CE R 
such that f(x) = g(x) +C for all x € 1. 


Proof. 


(1) Suppose that f is constant on /. It then follows from Example 4.2.3 (1) that 
f' (x) =0 for all x in the interior of J. 

Next, suppose that f’(x) = 0 for all x in the interior of J. Let p,q € I. Suppose 
that p 4 q. Without loss of generality, assume that p < qg. Then [p,q] C J. The Mean 
Value Theorem (Theorem 4.4.4) applied to f li implies that there is some c € (p,q) 
such that 


pq] 


f(q) = Fp) 


f= 
q—P 
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Because c € (p,q) then c is in the interior of /, and it follows that f(q) — f(p) = 
f'(c)(q—p) = 0. We deduce that f is constant on /. 


(2) This part of the lemma follows from Part (1) of this lemma applied to f — g; 
we omit the details. 


We now turn to the definition of antiderivatives. The crucial thing to keep in mind 
when considering antiderivatives is that, although they turn out (via the Fundamental 
Theorem of Calculus) to be intimately related to integrals, antiderivatives are defined 
strictly in relation to derivatives. The relation between antiderivatives and integrals is 
an amazing theorem that requires proof, and not just a matter of definition. 


Definition 4.4.8. Let 7 C R be an open interval, and let f: J — R be a function. 
An antiderivative of f is a function F: J — R such that F is differentiable and 
Fl=f. A 


For a given function f: / — R, where J C R is an open interval, we ask whether 
it has an antiderivative, and if it does, whether the antiderivative unique. We start with 
the latter question, which is the simpler of the two. Of course, as the reader knows 
from calculus courses, if a function has an antiderivative, then it will have more than 
one antiderivative. For example, both x? and x? +7 are antiderivatives of 2x. Hence 
antiderivatives are not unique. However, as stated in the following lemma, the next 
best thing holds, which is that on an open interval any two antiderivatives differ by 
a constant. The following lemma is an immediate consequence of Lemma 4.4.7 (2), 
and we omit the proof. 


Corollary 4.4.9. Let J CR be a non-degenerate open interval, and let f: I > R be 
a function. If F,G: I — R are antiderivatives of f, then there is some C € R such that 
F(x) = G(x) +C for all x € I. 


Does every function f: J — R, where J C R is an open interval, have an an- 
tiderivative? In other words, is every such function the derivative of a function? It 
turns out that every continuous function has an antiderivative, as we will see using 
Corollary 5.6.3, which is an immediate consequence of the Fundamental Theorem of 
Calculus Version I (Theorem 5.6.2). However, as we now show, not every function 
in general has an antiderivative. Our example of a function that does not have an 
antiderivative makes use of the following theorem, which relies upon ideas developed 
earlier in this section. 


Theorem 4.4.10 (Intermediate Value Theorem for Derivatives). Let J CR be an 
open interval, and let f: I — R be a function. Suppose that f is differentiable. Let 
a,b €I, and suppose that a < b. Letr ER. If r is strictly between f'(a) and f'(b), 
then there is some c € (a,b) such that f'(c) =r. 


Proof. Suppose that r is strictly between f’(a) and f’(b). Without loss of generality, 
assume that f’(a) <r < f’(b). 

Let g: I > R be defined by g(x) = f(x) — rx for all x € I. Because f is differen- 
tiable, it follows from Example 4.2.3 (1) and Theorem 4.3.1 (2) that g is differentiable, 
and that g’ (x) = f’(x) —r for all x € J. By Theorem 4.2.4 we know that g is continuous. 
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The Extreme Value Theorem (Theorem 3.5.1) applied to 8lfa,b) implies that there 
is some c € [a,b] such that g(c) < g(x) for all x € [a,b]. 
The definition of derivatives, combined with Lemma 3.2.17, imply that 


Because g(x) = f(x) — rx for all x € J, it follows that g’(a) <0 < g'(b). We then use 
the analog for one-sided limits of Theorem 3.2.4 (2) to deduce that there is some 
N <Oand some 6 > 0 such that x € J— {a} and x € (a,a+6) imply 


g(a) -8(@) 


<N. 


Because / is an open interval, we use Lemma 2.3.7 (2) to see that by taking a smaller 
value of 6 if necessary, we may assume that (a,a+6) CJ. Let y € (a,a+ 6). Then 
y—a> 0, and hence g(y) — g(a) <0, which implies that g(y) < g(a). Therefore c Za. 
A similar argument shows that c 4 b, and we omit the details. Hence c € (a,b). 
Lemma 4.4.1 applied to g|q,) implies that g‘(c) = 0. Hence f’(c) —r = 0, and it 
follows that f’(c) =r. 


What makes the Intermediate Value Theorem for Derivatives (Theorem 4.4.10) 
interesting is that even though the derivative of a differentiable function need not 
be continuous, as seen in Example 4.2.5 (1), it turns out that even a discontinu- 
ous derivative must satisfy the property given in the Intermediate Value Theorem 
(Theorem 3.5.2). Hence, as we see in the following simple example, although deriva- 
tives need not be continuous, not every discontinuous function is the derivative of 
something. 


Example 4.4.11. Let g: IR — R be defined by 


ifx<1 


a(x) = {3 fx 1, 


Then g is not the derivative of any function, because it does not satisfy the conclusion 
of the Intermediate Value Theorem for Derivatives (Theorem 4.4.10). % 


Reflections 


The Mean Value Theorem is often treated very cursorily in introductory calculus 
courses, or is not treated at all, which is understandable due to the applied and 
computational focus of such courses. From our present perspective, by contrast, the 
Mean Value Theorem is a crucial tool used to relate the behavior of the derivative of a 
function to the behavior of the original function, as will be seen, for example, in the 
proof of Theorem 4.5.2. 

The concept of antiderivatives is introduced in this section, though in principle 
this concept could have been defined in Section 4.2, because nothing more than the 
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definition of derivatives is needed. However, it is only in the present section that we 
are able to say something interesting about antiderivatives, and so we have delayed 
the definition of this concept till here. There is no problem delaying the definition of 
antiderivatives by a few sections as we have done, but there is a problem delaying that 
definition, as do some calculus books, until the chapter on integration. It is certainly 
true, as the reader knows from calculus courses, that it is in the calculation of definite 
integrals that the concept of antiderivatives become spectacularly useful; it is also 
true that antiderivatives, when written as “indefinite integrals,” use a notation that is 
extremely similar to the notation for “definite integrals.” Nonetheless, locating the 
definition of antiderivatives in the chapter on integrals can cause students to lose sight 
of the very important facts, stated above but worth stressing again, that antiderivatives 
are defined strictly in terms of derivatives, in spite of their notation, and that the close 
relation between antiderivatives and integrals (meaning the “definite” kind) is an 
amazing theorem and is not true simply by virtue of definition or notation. 


Exercises 


Exercise 4.4.1. Find an example of a function f: [a,b] — R for some non-degener- 
ate closed bounded interval [a,b] C IR such that f is continuous on [a,b], that f is 
differentiable on (a,b) except at one point and that f does not satisfy the conclusion 
of the Mean Value Theorem. 


Exercise 4.4.2. Find an example of a function f: [a,b] — R for some non-degenerate 
closed bounded interval [a,b] C R such that f is continuous and differentiable on 
(a,b), and that f does not satisfy the conclusion of the Mean Value Theorem. 


Exercise 4.4.3. Does Corollary 4.4.9 hold if / is not a single non-degenerate open 
interval, but is rather the union of finitely many such intervals? Give a proof or a 
counterexample. 


Exercise 4.4.4. Prove that 1 + 4x < 2x+1 for all x € (0,00). You may use standard 
rules for differentiation, even if we have not yet proved them. 


Exercise 4.4.5. Let f: IR — R be a function. Suppose that f is differentiable, that 
f(O) = 1 and that | f’(x)| < 1 for all x € R. Prove that | f(x)| < |x|+ 1 for allx ER. 


Exercise 4.4.6. [Used in Section 3.4 and Lemma 10.5.1.] This exercise refers to Ex- 
ercise 3.4.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is continuous on [a,b] and differentiable 
on (a,b), and that there is some M € R such that | f’(x)| < M for all x € (a,b). Prove 
that f satisfies a Lipschitz condition with Lipschitz constant M. It follows from 
Exercise 3.4.5 (1) that f is uniformly continuous. 


Exercise 4.4.7. [Used in Exercise 7.3.6.] Let [a,b] C R be a non-degenerate closed 

bounded interval, and let f: [a,b] + R be a function. Suppose that f is continuous on 

[a,b] and differentiable on (a,b). Prove that if lim f’(x) exists, then the one-sided 
x—b- 


derivative f’(b) exists and equals lim f(x). 
x3b- 
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Exercise 4.4.8. Let 7 C R be an open interval, and let f: J R be a function. Suppose 
that f is twice differentiable. Suppose that there are x,y,z € J such that x < y < z, and 
f(x) > f(y) and f(y) < f(z). Prove that there is some c € J such that f”(c) > 0. 


Exercise 4.4.9. [Used in Theorem 4.6.4 and Theorem 6.3.5.] Let J C R be a non-degen- 
erate open interval, and let f: J — R be a function. Suppose that f is differentiable, 
and that f’(x) £0 for all x €/. 


(1) Prove that f is injective. 
(2) Prove that either f’(x) > 0 for all x € J, or that f’(x) <0 for all x €/. 


Exercise 4.4.10. [Used in Section 4.4.] Using the ideas in the proof of Theorem 3.5.4, 
prove that Rolle’s Theorem (Lemma 4.4.3), the Mean Value Theorem (Theorem 4.4.4) 
and Cauchy’s Mean Value Theorem (Theorem 4.4.5) are equivalent to the Least Upper 
Bound Property. 


4.5 Increasing and Decreasing Functions, Part I: Local and 
Global Extrema 


Now that we have the Mean Value Theorem (Theorem 4.4.4) at our disposal, we turn 
to one of the main reasons why we are interested in derivatives, which is that the 
derivative of a function yields geometric information about the original function, for 
example whether it is increasing or decreasing. Much of the original motivation for 
looking at such geometric properties of functions was to help people graph functions, 
which was no mean feat before graphing calculators and computers. However, even 
though modern technology makes graphing functions easy, concepts such as increasing 
and decreasing are also useful in other aspects of mathematics and its applications, 
such as optimization problems. Moreover, doing a little bit of graphing by hand, even 
when computing technology is available, helps us develop a better intuitive feel for 
functions and their graphs. 

As will be seen below, a number of key geometric concepts such as increasing, 
decreasing, local maximum, local minimum and others are not about calculus per se; 
they will be defined without reference to differentiability. However, it turns out that 
even though calculus is not part of the definition of these concepts, there are some 
questions involving these concepts that are difficult to solve without calculus, and 
which can be solved easily with calculus—that is one of the reasons calculus is so 
great. 

Our most fundamental geometric definition is the following. 


Definition 4.5.1. Let A C R be a set, and let f: A — R be a function. 


1. The function f is increasing if x < y implies f(x) < f(y) for all x,y € A. 

2. The function f is strictly increasing if x < y implies f(x) < f(y) for all 
x,y EA. 

3. The function f is decreasing if x < y implies f(x) > f(y) for all x,y € A. 

4. The function f is strictly decreasing if x < y implies f(x) > f(y) for all 
x,yEA. 
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5. The function f is monotone if it is either increasing or decreasing. 
6. The function f is strictly monotone if it is either strictly increasing or strictly 
decreasing. A 


Some books use the terms “non-decreasing” and “increasing” to mean what we 
call “increasing” and “strictly increasing,” respectively, and similarly for decreasing. 
There is no definitive terminology here, and in any book that discusses these issues, it 
is worth checking the precise definitions that are used. 

It is often difficult to compute directly from the definition when a function is 
increasing or decreasing, but in the differentiable case there is a very simple way to 
show that a function is increasing or decreasing. 


Theorem 4.5.2. Let I C R be a non-degenerate interval, and let f: I R be a 
function. Suppose that f is continuous on I and differentiable on the interior of I. 


1. f'(x) > 0 for all x in the interior of I if and only if f is increasing on I. 
2. If f'(x) >0 for all x in the interior of I, then f is strictly increasing on I. 
3. f'(x) <0 for all x in the interior of I if and only if f is decreasing on I. 
4. If f'(x) <0 for all x in the interior of I, then f is strictly decreasing on I. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 4.5.6. 


(1) Suppose that f’(x) > 0 for all x in the interior of J. Let p,q € I. Suppose that 
p <q. Then [p,q] CI. The Mean Value Theorem (Theorem 4.4.4) applied to fj» 4) 
implies that there is some c € (p,q) such that 


filc)= f(@) f(P) 
q—P 
Because c € (p,q) then c is in the interior of J, and it follows that f(q) — f(p) = 
f'(c)(q—p) = 0. Therefore f(p) < f(q). We deduce that f is increasing. 

Now suppose that f is increasing. Let c be in the interior of 7. By Lemma 2.3.7 (2) 
there is some 6 > 0 such that (c—6,c+6) CJ. Letx € (c,c +6). Then f(c) < f(x), 
because f is increasing. Hence fa) Fe) 
Exercise 3.2.12 (1) we deduce that 


> 0. Using the analog for one-sided limits of 


Because f’(c) exists, then by Lemma 3.2.17 we know that f’(c) must equal the limit 
in the above equation. It follows that f’(c) > 0. 


We now see in the following example that Theorem 4.5.2 (2) (4) cannot be made 
into “if and only if” statements. 


Example 4.5.3. Let f: R — R be defined by f(x) =x° for all x € R. The function f 
is strictly increasing, as seen by Exercise 2.3.3 (1); that exercise does not make use of 
derivatives. However, we know by Exercise 4.3.5 that f’ (x) = 3x2 for all x € R, and 
hence f’(0) = 0. Therefore Theorem 4.5.2 (2) cannot be made into an “if and only if” 
statement. A similar example shows that Theorem 4.5.2 (4) cannot be made into an 
“if and only if” statement. © 
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Our next issue involves finding maximum values and minimum values of a func- 
tion. Again, the definition does not involve derivatives. 


Definition 4.5.4. Let A C R be a set, let c € A and let f: A — R be a function. 


1. The number c is a local maximum of f if there is some 6 > 0 such that x € A 
and |x—c| < 6 imply f(x) < f(c). 

2. The number c is a local minimum of f if there is some 6 > 0 such that x € A 
and |x—c| < 6 imply f(x) > f(c). 

3. The number c is a local extremum of / if it is either a local maximum or a 
local minimum. 

4, The number c is a global maximum of f if f(x) < f(c) for all x € A. 

5. The number c is a global minimum of f if f(x) > f(c) for all x € A. 

6. The number c is a global extremum of /f if it is either a global maximum or 
a global minimum. A 


There is, once again, no definitive terminology here. Some books use the terms 
“relative maximum” and “absolute maximum,” respectively, to mean what we call 
“local maximum” and “local minimum,” and similarly for minima. Note that the 
plurals of “maximum,” “ 
“extrema,” respectively. 

Observe in Definition 4.5.4 that the actual maximum values and minimum values 
of a function are not discussed, but only where such values occur, if they exist. 

A global maximum is always a local maximum, but not vice versa, and similarly 
for minima. A number c is a local maximum of a function if and only if it is a 
global maximum of the restriction of the function to some open interval containing 
c, and again similarly for minima. Additionally, observe that we use < and > in the 
definition of local and global maxima and minima, rather than < and >, respectively. 
Our definition is completely standard, and is very convenient, even though it does 
mean that local and global maxima and minima are not necessarily unique. For 
example, any point in the domain of a constant function is both a global maximum 
and a global minimum, which may not sound right at first glance, but is true according 
to Definition 4.5.4. 

As the reader is familiar from an introductory calculus course, there are functions 
with various combinations of global extrema and local extrema. For example, the 
function f: R — R defined by f(x) = 3x for all x € R has no local extrema or global 
extrema of any kind. By contrast, the function g: [0,7] — R defined by f(x) = 3x 
for all x € [0,7] has a global (and hence local) maximum and a global (and hence 
local) minimum. The function 4: R — R defined by h(x) = |x| for all x € R has a 
global minimum, but no local or global maximum. The function k: R — R defined 
by k(x) = sinx for all x € R has infinitely many global maxima and global minima. 
The function p: R — R defined by p(x) = 2° —x for all x € R has a local maximum 
and a local minimum, but no global extrema. 

For many real-world applications, the goal is to find global extrema of functions. 
It is easier, however, to find local extrema, and finding them, when they exist, helps us 
locate global extrema. Hence, we examine local extrema first. The following lemma 
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minimum” and “extremum” are “maxima,” “minima” and 
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gives a very simple method for finding local extrema. Similarly to Definition 4.5.4, 
this lemma does not involve differentiability. 


Lemma 4.5.5. Let A C R be a set, letc € Aand let f: A — R be a function. 


1. If there is some 6 > 0 such that f\4n(c—6,q i increasing and f\anjc.c+8) iS 
decreasing, then c is a local maximum of f. 

2. If there is some 6 > 0 such that f\an(c—s,¢ is decreasing and f\anjc,c+8) iS 
increasing, then c is a local minimum of f. 


Proof. Left to the reader in Exercise 4.5.7. 


The reader is asked in Exercise 4.5.8 to show that not every local extremum of a 
function satisfies the hypotheses of either part of Lemma 4.5.5. 

When a function is differentiable, there is a nice way to locate local extrema. We 
will need the following definition and lemma 


Definition 4.5.6. Let J C R be an open interval, let c € J and let f: 1— R bea 
function. The number c is a critical point of f if either f is differentiable at c and 
f'(c) = 0, or f is not differentiable at c. A 


Lemma 4.5.7. Let] C R be an open interval, let c € I and let f : 1 — R be a function. 
Tf c is a local extremum of f, then c is a critical point of f. 


Proof. Suppose that c is a local extremum of f. We assume that c is a local maximum; 
the case where c is a local minimum is similar, and we omit the details. By the 
definition of local maxima, there is some 6 > 0 such that x € J and |x—c| < 6 imply 
f(x) < f(c). Because / is an open interval, we use Lemma 2.3.7 (2) to see that by 
taking a smaller value of 6 if necessary, we may assume that [c — 6,c+ 6] CJ. Then 
f(x) < f(c) for all x € [c-— 6,c+ 6]. 

If f is not differentiable at c, then there is nothing to prove, so suppose that f is 
differentiable at c. By Exercise 4.2.3 (3) we know that f (= +6] is differentiable at 
cand (f|{c—s,-+8])' (¢) = f’(c). It follows from Lemma 4.4.1 applied to f|j-_3.c+5 
that (f|j--3,+8]) (¢) = 0, and hence f’(c) =0. 


It is important to recognize that critical points need not be local extrema, as seen 
in the following example. 


Example 4.5.8. Let f: [—1,1] — R be defined by f(x) =x? for all x € [—1,1]. 
Because f’(x) = 3x? for all x € R, then f’(0) = 0, and hence 0 is a critical point of 
jf. However, as remarked in Example 4.5.3, the function f is strictly increasing, and 
therefore 0 is neither a local maximum nor a local minimum of /. © 


Even though not all critical points of a function are local extrema, if we want to 
find the local extrema of a function, the standard approach is to find all the critical 
points first, and then identify which, if any, of the critical points are actually local 
extrema. The following theorem provides a good way to tell which critical points are 
local extrema. 
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Theorem 4.5.9 (First Derivative Test). Let J C R be an open interval, let c € I and 
let f: I— R be a function. Suppose that c is a critical point of f, and that f is 
continuous on I and differentiable on I — {c}. 


1. Suppose that there is some 6 > 0 such that x € I and c—6 <x <c imply 
f' (x) = 0, and that x € and c <x <c+6 imply f'(x) <0. Then c is a local 
maximum of f. 

2. Suppose that there is some 6 > 0 such that x € I and c—6 <x <c imply 
f' (x) <0, and that x € Land c <x <c+6 imply f'(x) > 0. Then c is a local 
minimum of f. 

3. Suppose that there is some 5 > 0 such that x € I— {c} and |x—c| < 6 imply 
f'(x) > 0, or that x € I— {c} and |x—c| < 6 imply f'(x) <0. Then c is not a 
local extremum of f. 


Proof. We will prove Part (1); the other parts are similar, and we omit the details. 


(1) Because / is an open interval, we use Lemma 2.3.7 (2) to see that by taking a 
smaller value of 6 if necessary, we may assume that [c — 6,c+ 6] CI. Let p=c—6 
and g =c+0. Then [p,q] CJ. Because f is continuous on / and differentiable on 
I — {c}, it follows from Exercise 3.3.2 (2) and Exercise 4.2.3 (5) that f is continuous 
on [p,c] and [c,q], and differentiable on (p,c) and (c,qg). By hypothesis we know that 
f' (x) = 0 for all x € (p,c) and f’(x) < 0 for all x € (c,q). Using Theorem 4.5.2 (1)(3) 
we see that f lip.cl is increasing and f lic.al is decreasing. Lemma 4.5.5 (1) now implies 
that c is a local maximum. 


It might appear that the three parts of the First Derivative Test (Theorem 4.5.9) 
do not cover all possible cases, because the third part has > and < rather than > 
and <. However, if the critical point c is isolated, which means that there is some 
open interval containing c with no other critical points, then the three parts of the 
First Derivative Test do cover all possible cases. In practice, most critical points 
encountered in the applications of these methods are isolated. 

In addition to the First Derivative Test, there is another widely used test for 
finding local maxima and minima, namely, the Second Derivative Test. In principle, it 
would be possible to live without the Second Derivative Test, and use only the First 
Derivative Test, because the latter is usable in all cases, whereas the former is not. 
However, the Second Derivative Test is sometimes easier to use in practice than the 
First Derivative Test, and so it is worth knowing. 


Theorem 4.5.10 (Second Derivative Test). Let 1 C R be an open interval, let c € I 
and let f : I R be a function. Suppose that f is differentiable, that f'(c) =0 and 
that f is twice differentiable at c. 


1. If f"(c) > 0, then c is a local minimum of f. 
2. If f"'(c) <0, then c is a local maximum of f. 


Proof. We will prove Part (1); the other part is similar, and we omit the details. 


(1) Suppose that f”(c) > 0. By the definition of derivatives, we know that 
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f'(c) = lim f' (x) =f) : 


xX x-C 


Because f’(c) = 0 and f”(c) > 0, it follows that 


By Theorem 3.2.4 (1) we know that there is some M > 0 and some 6 > 0 such that 
x €I1—{c} and |x—c| < 6 imply £0) 5 M.Ifx€Iandc—5 <x <c, thenx—c <0, 


x—-C 
and hence Lis) > 0 implies f’(x) < 0. If x € J and c <x <c+6, then a similar 
argument shows that f’(x) > 0. Part (2) of the First Derivative Test (Theorem 4.5.9) 
now implies that c is a local minimum of f. 


The Second Derivative Test (Theorem 4.5.10) does not say anything about what 
happens when f(c) = 0. In such a situation, it turns out that c could be a local 
maximum, a local minimum or neither, as seen in the first part of the following 
example. Moreover, there are situations where the Second Derivative Test does not 
work, for example when / is not differentiable at c, but where the First Derivative 
Test (Theorem 4.5.9) can still be used, as seen in the second part of the following 
example. 


Example 4.5.11. 


(1) Let f,g: R — R be defined by f(x) =x° and g(x) = x* for all x CR. It 
is straightforward to verify that f’(0) = 0 and g/(0) = 0, and that f”(0) = 0 and 
ge (0) =0. Because x* = (x?)? > 0 for all x € R, then 0 is a local (and also global) 
minimum of g. As noted in Example 4.5.8, the number 0 is not a local extremum of f. 

(2) Let k: R — R be defined by k(x) = |x| for all x € R. We saw in Exam- 
ple 4.2.3 (3) that k is not differentiable at 0, and hence 0 is a critical point of k. We 
also saw that k’(x) = —1 for all x € (—c0,0), and k’(x) = 1 for all x € (0,00). Because k 
is not differentiable at 0, we cannot apply the Second Derivative Test (Theorem 4.5.10) 
to k at 0. However, the First Derivative Test (Theorem 4.5.9) can still be applied, and 
we see that 0 is a local minimum of k, which is just what we would expect by looking 
at the graph of k. » 


We now turn to global extrema. Not every function has a global maximum or 
a global minimum. However, there are two very useful situations where we can 
guarantee the existence of global extrema. 

The first situation concerns continuous functions of the form f: [a,b] — R, where 
(a, b| is a non-degenerate closed bounded interval. We know by the Extreme Value 
Theorem (Theorem 3.5.1) that such a function f has a global maximum and a global 
minimum, and so the question in this situation is not the existence of global extrema, 
but rather how to find them in practice (recall that the Extreme Value Theorem 
provides no such information). The key observation, which is really very simple, is 
that a global extremum must also be a local extremum. Where are the local extrema 
of our function? On the interval (a,b), we know by Lemma 4.5.7 that the local 
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extrema must be critical points. The only other possibility for local extrema are the 
endpoints of the interval [a,b]. Hence, given a continuous function f: [a,b] + R on 
a non-degenerate closed bounded interval, the global extrema can be found by first 
finding the critical points of f, then computing the value of f at the critical points 
and endpoints, and then comparing these values—the largest value occurs at a global 
maxima, and the smallest value occurs at a global minima. There is no need for the 
First Derivative Test or the Second Derivative Test in this situation. 

The other situation where it is easy to find global extrema concerns continuous 
functions of the form f: J — R, where / is a non-degenerate open interval, and 
when we have the added condition that f has only one critical point. The following 
theorem, which is rarely stated explicitly in calculus texts, is often used implicitly in 
optimization word problems, and so we prove it here. 


Theorem 4.5.12. Let J C R be an open interval, let c € I and let f: 1+ Rbea 
function. Suppose that f is continuous, and that c is the only critical point of f. 


1. If cis a local maximum, then it is a global maximum. 
2. If cis a local minimum, then it is a global minimum. 


Proof. We will prove Part (1); the other part is similar, and we omit the details. 


(1) Suppose that c is a local maximum. Suppose further that c is not a global max- 
imum. Hence, there is some d € / such that f(c) < f(d). Without loss of generality, 
assume that c < d. 

Because c is a local maximum of f, there is some 6 > 0 such that x € J and 
|x —c| < 6 imply f(x) < f(c). By choosing 6 sufficiently small, we may suppose 
that [c,c +6) C [c,d]. Then x € [c,c + 6) implies f(x) < f(c). 

Because f is continuous, then f [Teal is continuous by Exercise 3.3.2 (2). The 
Extreme Value Theorem (Theorem 3.5.1) applied to f Ife implies that there are 
Xmin;Xmax € {c,d] such that f (nin) < f(x) < f (Xmax) for all x € [c,d]. Because f(c) < 
f (d), it cannot be the case that xin = d. 

There are now two cases. First, suppose that x;,in = c. It follows that f(c) < f(x) 
for all x € [c,c +6). However, we saw above that f(x) < f(c) for all x € [c,c +6), 
and we deduce that f is constant on [c,c+ 6). It follows that f is differentiable at x 
and f’(x) = 0 for all x € (c,c+6). Therefore every number in (c,c + 6) is a critical 
point of f, which is a contradiction to the fact that c is the only critical point of f. 
Second, suppose that X;nin Ac. Therefore Xmin € (c,d). By Lemma 4.5.7 applied to 
f l(c,d) we know that x,);, must be a critical point of f cas and hence xj. must be a 
critical point of f by Exercise 4.2.3 (3), again a contradiction to the fact that c is the 
only critical point of f. We conclude that c is a global maximum. 


Reflections 


The material in this section affords us the opportunity to consider the interplay 
between intuitive concepts and rigorous definitions. For example, we all have an 
intuitive idea of what it means for a function to be increasing—the graph “goes 
up” as we move to the right. The actual rigorous definition of this concept given 
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in Definition 4.5.1 (1) appears to capture that idea of “going up” quite nicely, and 
so most of us would not hesitate to use this rigorous definition. However, there are 
some intuitive concepts for which it does not appear possible to find a rigorous 
definition that so self-evidently captures the intuitive concept. The definition of area 
in Section 5.9 is such an example. We all have an intuitive idea of what the area of 
a region in the plane means, but the rigorous definition of this concept is somewhat 
tricky, and took mathematicians a very long time to figure out. 

Ultimately, the problem is that we cannot prove that what has been defined 
rigorously is the same as the idea about which we have an intuitive understanding, 
because intuitive ideas are not susceptible to rigorous proofs. Intuitive ideas and 
rigorous definitions exist in separate worlds, but they are worlds that, if we are careful 
with our choice of rigorous definitions, will nicely correlate with each other. Even 
if one cannot prove that a rigorous definition faithfully captures an intuitive idea, 
it is possible in many cases to be reasonably certain that a rigorous definition truly 
captures the intuitive idea if one can verify that the concept that has been defined 
rigorously behaves the way the intuitive concept is supposed to behave. It is the 
behavior of mathematical objects, not how they are defined, that provides evidence 
for a correlation between the world of rigor and the world of intuition. 


Exercises 


Exercise 4.5.1. [Used in Exercise 4.6.4.] Let A C R be a set, and let f: A— Rbea 
function. Prove that if f is monotone and injective, then f is strictly monotone. 


Exercise 4.5.2. 


(1) Find an example of a continuous increasing function f: (0,1) — R such that 
f((0, 1)) is a non-degenerate closed bounded interval. 

(2) Can there be a strictly increasing function f: (0,1) — R such that f((0, 1)) is 
a closed bounded interval? Either give an example to show that there is such a 
function, or give a proof that there cannot be one. 


Exercise 4.5.3. Let [a,b] C R and [b,c] C R be non-degenerate closed bounded 
intervals, and let f: [a,b] + R and g: [b,c] — R be functions. Let h: [a,c] > R be 
defined by 
if b 
nS te a 
g(x), ifx€ [b,c]. 


Suppose that f(b) = g(b). Prove that if f and g are increasing, then / is increasing. 


Exercise 4.5.4. Let A C R be a set, and let f,g: A — R be functions. Suppose that f 
and g are increasing. 


(1) Prove that f+ g is increasing. 
(2) Is f — g necessarily either increasing or decreasing? Give a proof or a coun- 
terexample. 


4.6 Increasing and Decreasing Functions, Part II: Further Topics 215 


Exercise 4.5.5. [Used in Exercise 4.6.4.] Let 7 C IR be a non-degenerate open bounded 
interval, let c € J and let f: J — R be a function. Suppose that f is differentiable at 
C; 


(1) Prove that if f’(c) > 0, then there is some 6 > 0 such that (c—6,c+6) CJ, 
that x € (c — 6,c] implies f(x) < f(c), and that x € [c,c +6) implies f(x) > 
t(0). 

(2) Prove that if f’(c) <0, then there is some 6 > 0 such that (c—6,c+6) CJ, 
that x € (c — 6,c] implies f(x) > f(c), and that x € [c,c +6) implies f(x) < 


F(e). 
Exercise 4.5.6. [Used in Theorem 4.5.2.] Prove Theorem 4.5.2 (2) (3) (4). 


Exercise 4.5.7. [Used in Lemma 4.5.5.] Prove Lemma 4.5.5. 


Exercise 4.5.8. [Used in Section 4.5.] Find an example of a function f: R— R 
such that f has a local minimum at 0, but that f does not satisfy the hypotheses of 
Lemma 4.5.5 (2). Defining the function by sketching its graph is sufficient. 


Exercise 4.5.9. [Used in Exercise 7.4.1.] Let (a,b) C R be a non-degenerate open 
interval, let c € (a,b) and let f: (a,b) + R be a function. Suppose that f is increasing 
and bounded. Prove that lub f([c,b)) = lub f((a,b)). 


Exercise 4.5.10. [Used in Exercise 5.8.8, Theorem 6.4.11 and Exercise 7.4.1.] Let 
(a,b) CR be a non-degenerate half-open interval, and let f,g: [a,b) — R be functions. 
Suppose that f is increasing. 
(1) Prove that if f is bounded, then lim f(x) exists and lim f(x) =lub f([a,b)). 
x—b- x—b— 


a 


(2) Prove that if f(x) < g(x) for all x € (a,b), and if lim g(x) exists, then 
xb- 


lim f(x) exists and lim f(x) < lim g(x). 
x—b- x—b- x—b- 


Exercise 4.5.11. [Used in Exercise 5.7.5 and Theorem 6.4.12.] Let [a,b] C IR be a non- 
degenerate closed bounded interval, and let f: [a,b] — R be a function. Suppose that 
f is continuous and injective. Prove that f is strictly monotone. [Use Exercise 3.3.2.] 


4.6 Increasing and Decreasing Functions, Part II: Further Topics 


In this section we discuss two additional topics that are related to the concept of 
increasing and decreasing functions, the first of which is the differentiability of 
inverse functions, and the second of which is concave up functions. These two topics 
are independent of each other; the second topic starts after the proof of Theorem 4.6.4. 

For our first topic, rather than using differentiation as a tool to help analyze geo- 
metric properties of functions, as we did in Section 4.5, we reverse the approach, and 
will use the concepts of increasing and decreasing to help us differentiate functions, 
specifically the inverse functions of bijective differentiable functions. This topic will 
be useful in our study of exponential functions in Section 7.2 and trigonometric 
functions in Section 7.3. 
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We start with a very brief review of some basic facts about inverse functions gen- 
erally, before turning to the specific question of differentiability of such functions. We 
assume that the reader is familiar with basic concepts such as injectivity, surjectivity, 
bijectivity and inverse functions; we mention here only a few key facts for review. 
See [Blo10, Section 4.4] for a more thorough treatment of this material. 

Let A and B be sets, and let f: A — B be a function. Suppose that f is bijective. 
Then f has an inverse function, which is denoted f~!: B — A. By the definition of 
inverse functions, we know that f—!(f(x)) =x for all x € A and f(f~!(x)) =x for 
all x € B. These two equations can be rephrased by saying that x = f—!(y) if and only 
if y = f(x) for all x € A and y € B. Whereas this latter formulation is usually less 
convenient than the former formulation from the point of view of rigorous proofs, we 
mention it because it might be familiar to the reader from precalculus and calculus 
courses, for example where the natural logarithm function In is defined by saying that 
y =Inx if and only if x = e’, and similarly for the inverse trigonometric functions. 
In the particular case of a function of the form f: A — B, where A,B CR, sucha 
function f is bijective if and only if every horizontal line through a number in B 
intersects the graph of f in precisely one point. In that case the graph of f~! can be 
obtained by reflecting the graph of f in the line y = x. 

Injectivity alone is not sufficient to guarantee that a function has an inverse. 
However, by restricting the codomain of an injective function f: A — B, we can also 
view it as a bijective function f: A — f(A). In principle, changing the codomain of 
a function changes the function, and we should not really use the same letter “f” to 
denote both f: A — B and f: A — f(A). However, we will use this abuse of notation 
because it will make for easier reading, and because no confusion should arise. If 
f: A-— B is injective, then the function f: A — f(A) has an inverse function, which 
will be denoted f-!: f(A) > A. 

We now turn to the question of the differentiability of inverse functions. Suppose 
that f: J — R is an injective differentiable function, where J C R is a non-degen- 
erate open interval; as above we view f as a bijective function f: J — f(1). Is 
f-': f(D) — I necessarily differentiable? Indeed, is f(/) necessarily an open interval, 
which we would want in order to take the derivative of f~!? Although the answer to 
the latter question is yes, as the reader is asked to prove in Exercise 4.6.4, the answer 
to the former questions is no, as we see in the following example. 


Example 4.6.1. Let f: R — R be defined by f(x) =x? for all x € R. Intuitively, we 
know that the function f is bijective, and hence it has an inverse function f—!: R— R, 
which we write as f~!(x) = \/x for all x € IR. Moreover, we know that the graph of 
f | is obtained from the graph of f by reflection in the line y = x. Because f has a 
horizontal tangent line at the origin, then the graph of f—! has a vertical tangent line 
at x = 0, which makes it not differentiable at x = 0. 

The above intuitive ideas, though correct, do not constitute a rigorous proof, 
because we cannot rely upon graphical arguments. Moreover, we have not yet seen 
a rigorous treatment of how to find the derivative of power functions such as \/x; 
we will see that in Section 7.2, but it will have to wait until after we have discussed 
integration in Chapter 5. In the meantime, however, we can offer the following proof 
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that f—! is not differentiable at x = 0 using an ad hoc argument based upon only what 
we have seen so far, together with Lemma 4.6.2, which immediately follows this 
example, and which does not make use of anything in this example. 

As stated in Example 4.5.3, the function f is strictly increasing (and hence 
injective), and it is differentiable, with derivative f’ (x) = 3x2 for all x € R. Observe 
that f’(0) = 0. By Theorem 4.2.4 we know that f is continuous. Additionally, it 
follows from Exercise 2.3.3 (2) that f (IR) is not bounded above and is not bounded 
below. 

We can now use the various parts of Lemma 4.6.2 below to deduce that f is 
bijective, that f(IR) = R and that f~! is continuous and strictly increasing. We will 
show that f~! is not differentiable at 0. 

We start with some preliminary observations. First, we know that 0° = 0, and 
hence \/0 = 0. Further, because f—! is bijective, then </x 4 0 when x 4 0. Second, 
the condition f(f~!(x)) =x for all x € R can be written as (¥/x)? = x for all x ER. 


We deduce that (</x)*/x = x for all x € R, and hence ia = iRP for all x € R such 
that x £ 0. (Of course, this last fact can be proved more easily using the standard 
properties of power functions, with which the reader is informally familiar, and which 
we will prove in Section 7.2.) Third, we note that because f—! is continuous, we can 


deduce in particular that lim V/x = 0 =0. It then follows from Theorem 3.2.10 (4) 
x—| 


that lim (</x)? = 0. We now use Exercise 3.2.8 to deduce that 


foQ=7- ©) ae 


li = 1 al | 
0 x0 x0 x—-0 x0 (SR) 
does not exist. It follows that f —! is not differentiable at 0. © 


The reasoning used in Example 4.6.1 applies to any injective differentiable func- 
tion that has its derivative equal to zero at a point, so that the inverse of any such 
function will not be differentiable. Fortunately, as seen in Theorem 4.6.4 below, a 
derivative being zero is the only obstacle to the differentiability of inverse functions. 
We start with the following lemma, which is about inverses of monotone functions 
(not necessarily differentiable). 


Lemma 4.6.2. Let I C R be a non-degenerate open interval, and let f: I— R bea 
function. Suppose that f is strictly monotone. 


1. The function f: I > f (I) is bijective. 
2. Suppose that f is continuous. Then f (I) is a non-degenerate open interval, 
and one of the following holds: 
a. If the interval f (I) is bounded, then f (I) = (glb f (7), lub f(J)). 
b. If the interval f(I) is bounded above but is not bounded below, then 
f(D) = (—e, ub f(2)). 
c. If the interval f (I) is bounded below but is not bounded above, then 
f(D) = (glb f (0). 
d. If the interval f (I) is not bounded above and is not bounded below, then 


fD=R 
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3. If f is continuous and strictly increasing (or strictly decreasing), then the func- 
tion f—': f(I) — Lis continuous and strictly increasing (or strictly decreasing, 
respectively). 


Proof. We will prove Part (2), leaving the rest to the reader in Exercise 4.6.1. 


(2) Suppose that f is strictly increasing; the case where f is strictly decreasing is 
similar, and we omit the details. 

We will prove Item (a) of this part of the lemma, leaving the rest to the reader in 
Exercise 4.6.2. 

Suppose that f(Z) is bounded. Because J 4 @, then f(/) 4 0, and therefore the 
Least Upper Bound Property and the Greatest Lower Bound Property imply that f(Z) 
has a least upper bound and a greatest lower bound. 

Let x € f(/). Then there is some z € J such that f(z) =x. Because / is an open 
interval, it follows from Lemma 2.3.7 (2) that there are c,d € J such that c < z <d. 
Because f is strictly increasing, then f(c) < f(z) < f(d), and hence f(c) <x < f(d). 
Therefore glb f(J) < f(c) <x < f(d) < lub f(J), and hence x € (glb f(/), lub f(/)). 
We deduce that f(Z) C (glb f(J), lub f(/)). 

Let y € (glb f(/),lub f(/)). Then glb f(J) < y < lub f(/). Let € = lub f() —y. 
Then € > 0. By Lemma 2.6.5 (1) there is some q € f(Z) such that lub f(J) —e <q < 
lub f (J). Hence lub f(Z) — (lub f(Z) —y) < g < lub f (J), which yields y < q. A similar 
argument shows that there is some p € f(J) such that p < y; we omit the details. 

Because p,q € f(J), there are s,t € J such that f(s) = p and f(t) = q. We know 
p#q, and hence s £t. If s >t, then because f is strictly increasing it follows that 
f(s) > f(t), which means p > q, which is a contradiction. Therefore s < t. 

Because / is an interval, we know that [s,t] C /. It follows from Exercise 3.3.2 (2) 
that /|/,,) is continuous. Observe that f(s) <y < f(r). The Intermediate Value The- 
orem (Theorem 3.5.2) applied to f|j,,; implies that there is some r € (s,f) such 
that f(r) =y. Hence y € f(/). We deduce that (glb f(/), lub f(/)) C f(Z). Therefore 


FU) = (glb F(Z), lub f(/)). 


Example 4.6.3. We want to show that the square root function is continuous. Let 
f: (0,00) — R be defined by f(x) =x? for all x € R. By Exercise 3.5.6 (1) we see 
that f is strictly increasing, and by Example 3.3.7 (1) we see that f is continuous. 
Exercise 3.5.6 implies that f((0,°¢)) = (0,°¢). It then follows from Lemma 4.6.2 (3) 
that f—!: (0,-c) — (0,0) is continuous and strictly increasing. By Definition 2.6.10 
we see that f~!(x) = \/x for all x € (0,0). The continuity of this function could also 
be shown directly by an €—6 proof, but Lemma 4.6.2 allows us to avoid that. 0) 


We are now ready to prove the formula for the derivative of the inverse of a 
differentiable function, subject to suitable hypotheses. This formula, given in Theo- 
rem 4.6.4 (4) below, is often “proved” in a calculus course roughly as follows (though 
without our way of writing functions). “Let f: 1 > f(/) be a differentiable function 
that has an inverse function f—!: f(I) > I, which means that f(f~!(x)) =x for all 
x € f (I). Taking the derivative of each side of this equation, and making use of the 
Chain Rule on the left-hand side, we obtain f’(f~!(x))«[f~!]'(x) = 1 for all x € f(J). 
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Dividing both sides of this equation by f’(f~!(x)) yields the desired formula for 
LFV)? 

The problem with the above “proof” is that it assumes that f—! is differentiable, 
which is needed for the Chain Rule to be used, but such an assumption is not justified 
unless we prove it. The proof that f—! is differentiable turns out to be a bit more 
tricky than the above alleged proof. Moreover, this proof also shows, with no extra 
effort, that the formula for [f~']'(x) holds, and because of that we can skip the above 
alleged proof entirely. 


Theorem 4.6.4. Let 1 C R be a non-degenerate open interval, and let f: I— R be a 
function. Suppose that f is differentiable, and that f'(x) #0 for all x €1. 


1. The function f is strictly monotone. 

2. The function f : I > f (1) is bijective. 

3. The function f—!: f(I) — 1 is differentiable. 
4. The derivative of f—' is given by 


forallx € f(1). 


Proof. Throughout this proof, when we write f, we will think of it as a function 
I> f(1). 


(1) This part of the theorem follows immediately from Exercise 4.4.9 (2) and 
Theorem 4.5.2 (2) (4), which is stated for closed intervals, but also holds for open 
intervals. 


(2) The fact that f is bijective follows immediately from Part (1) of this theorem 
and Lemma 4.6.2 (1). 


(3) & (4) We start with some preliminary observations. By Theorem 4.2.4 we 
know that f is continuous, and by Part (1) of this theorem we know that f is strictly 
monotone. Lemma 4.6.2 (3) implies that f~!: f() — J is continuous and strictly 
monotone. 

Let c € f(I). Letd = f—!(c). Let F: I— {d} — R be defined by 


for all x € I— {d}. By Part (2) of this theorem we know that f is bijective, and hence 
if x € J— {d} then f(x) 4 f(d), which implies that F is well-defined. 

By hypothesis we know that f’(d) 4 0. We can therefore use Theorem 3.2.10 (5), 
together with the definition of derivatives, to compute 


= 1 1 1 
iehG) Se — te = = 


wed) ed FQ) Fd) xed LD FA) FO) 


x—d 
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Because f~! is continuous, we know by Lemma 3.3.2 that lim f~!(y) = f~!(c) = 
yroc 
d. Theorem 3.2.12 now implies that lim (F o f~!)(y) exists and lim (F 0 f~!)(y) = 
yc yoc 
lim F (x). Then 


xd 
el OT gs 
ya y—c ye f(f-l(y)) — fd) 
=lim(Fof-")() = lim Fa) = Fo 
It follows that [f~!]/(c) exists and 
Wty 1 
[f Me) = aT: 


We now turn to the second topic of this section, which is the idea of functions 
being concave up or concave down. We follow [Gor02] in part. Similarly to our 
discussion in Section 4.5, this topic concerns geometric properties of graphs of 
functions. It is quite simple to visualize concave up and concave down functions 
intuitively, as in Figure 4.6.1. For a rigorous approach to these concepts, we take as 
our model the concepts of increasing and decreasing, in that these concepts are defined 
without reference to calculus, and then we saw in Theorem 4.5.2 that for differentiable 
functions, increasing and decreasing can be characterized in terms of the derivative. 
Unfortunately, the analogy between concave up and concave down, and increasing and 
decreasing, is not perfect. On the one hand, we will have a theorem analogous to Theo- 
rem 4.5.2, but with the second derivative replacing the first derivative. On the other 
hand, the standard definition of concave up and concave down that is given in calculus 
courses, which is that the derivative of the function is increasing, is quite different 
in nature from the geometric (and non-calculus-based) definition of increasing and 
decreasing. It would be nicer to have a non-calculus definition of concave up and 
concave down, because the idea is inherently geometric. For the sake of brevity, we 
will restrict our attention to concave up; a treatment of concave down is similar, and 
we omit the details. 

We will, in fact, give two variant definitions of concave up in Theorem 4.6.6 below, 
though these characterizations of concave up are not as simple as the definition of 
increasing and decreasing. These characterizations involve the notion of a secant line 
to a curve, which is a line through pairs of distinct points on the curve, and which is 
defined in Definition 4.6.5. In Figure 4.6.2 a concave up graph of a function is shown, 
together with two of its secant lines. The reader will observe that each secant line is 
above the curve in the interval that is between the points where the line intersects the 
curve; that is the intuitive idea of the first characterization of concave up. The reader 
will also observe that of the two secant lines, the one that is to the right has a larger 
slope; that is the intuitive idea of the second characterization of concave up. 


Definition 4.6.5. Let J C R be an open interval, let a,b € J and let f: 1— Rbea 
function. Suppose that a < b. The secant line through (a, f(a)) and (b, f(b)) is the 
function S,,: R — R defined by 
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y y 
x x 
concave up concave down 
Fig. 4.6.1. 
y 
x 
concave up 
Fig. 4.6.2. 
b-—x x—a 
Sa,p(x) = f(a) + f(b) 
b-a b— 


for all x € R. The slope of the secant line through (a, f(a)) and (b, f(b)), denoted 
Mz», is defined by 
b)— 
f(b) = fla). ie 
b-—a 
It is left to the reader to verify that the formula for S,,(x) given in Definition 4.6.5 
is indeed the straight line through the points (a, f(a)) and (b, f(b)). 


Theorem 4.6.6. Let I C R be an open interval, and let f : I — R be a function. The 
following are equivalent. 


a. Ifa,b € landa <b, then f(x) < Sqp(x) for all x € [a,b] (Function Lies Below 
Its Secant Lines). 

b. Ifa,b,c €landa <b <c, then Map < Mp,¢ (Function Has Increasing Secant 
Line Slopes). 


Map = 
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Proof. 


(a) = (b) Suppose that if a,b € J and a < b, then f(x) < Sq4(x) for all x € [a,b]. 
Let a,b,c € I. Suppose that a < b < c. Then b € [a,c], and so by hypothesis we 
know f(b) < Sac(b), which means that 


c . 
c—a c-a 
It is left to the reader to deduce from the above inequality that 


f(b) — Fla) - fle) = Fla) 


b-a _ c—a 


A similar calculation shows that 


Fle) Fa) - Fle) — f(b) 


c-a ms c—b 


Combining these last two inequalities, and using the definition of M, 4, we see that 
Map < Mpc. 


(b) = (a) Suppose that ifa,b,c cl anda <b <c, then My, < Mp<. 

Let a,b € I. Suppose that a < b. Let x € [a,b]. If x =a, then S,.4(x) = S4p(a) = 
f(a), and if x = b then Sq.p(x) = Say(b) = f(b). Now suppose that a < x < b. Then 
by hypothesis we know M,.y < M,», which means that 


FR)—Fla) - Fb) — FO) 


x—a a b-x 


It is left to the reader to deduce from the above inequality that 


b-—x x-—a 


+f). 


which means that f(x) < S,.(x). 
We now use Theorem 4.6.6 as the basis for the following definition. 


Definition 4.6.7. Let 7 C R be an open interval, and let f: J — R be a function. The 
function f is concave up if either of the two conditions in Theorem 4.6.6 hold. A 


In the differentiable case, we can use both the first derivative and second derivative 
to characterize when a function is concave up; a similar result holds for concave down. 
The first part of the following theorem shows that the usual definition of concave up 
given in calculus courses is equivalent to the geometric approach of Theorem 4.6.6. 


Theorem 4.6.8. Let I C R be an open interval, and let f : I — R be a function. 


1. Suppose that f is differentiable. Then the two conditions in Theorem 4.6.6 
hold if and only if f' is increasing on I. 
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2. Suppose that f is twice differentiable. Then the two conditions in Theo- 
rem 4.6.6 hold if and only if f" (x) > 0 for all x €1. 


Proof. 


(1) Because of Theorem 4.6.6, it will suffice to prove that f’ is increasing if and 
only if Theorem 4.6.6 (b) holds. 

Suppose that f’ is increasing. Let a,b,c € I. Suppose that a < b < c. Because f 
is differentiable, then it is continuous by Theorem 4.2.4. The Mean Value Theorem 
(Theorem 4.4.4) applied to each of f|{q,5j and f|{»,<) implies that there are p € (a,c) 
and q € (c,b) such that f’(p) =Mgq,- and f’(q) = Mc». Because f’ is increasing, we 
see that f’(p) < f’(q), and hence M,» < Mp. It follows that Theorem 4.6.6 (b) holds. 

Now suppose that Theorem 4.6.6 (b) holds. Let x,y € J. Suppose that x < y. 
Because J is open there are w,z € J such that w < x and y < z. Then by hypothesis we 
know that My, <M, and M,, <M),-. By the definition of derivatives, combined 
with Lemma 3.2.17, we see that f’(x) = lim My, and f’(y) = lim My... Using the 

wrx ty 


analogs for one-sided limits of Theorem 3.2.13 and Exercise 3.2.1, we deduce that 
f' (x) <Myy < f'(y). It follows that f’ is increasing. 


(2) This part of the theorem follows immediately from Part (1) of this theorem 
together with Theorem 4.5.2 (1) applied to f’. 


Reflections 


The concepts of concave up and concave down are partly analogous to the concepts 
of increasing and decreasing, though with second derivatives instead of first derivatives. 
The reader might have noticed, however, that we did not take this analogy as far as it 
can go, in that we did not discuss the second derivative analog of local extrema. Of 
course, there is such an analog, namely, inflection points, which the reader has seen in 
calculus courses. This analogy is almost, but not entirely, complete. Recall from the 
First Derivative Test (Theorem 4.5.9) that a local extremum occurs where the function 
changes from increasing to decreasing or vice versa. Whereas the First Derivative Test 
is a theorem about local extrema, and is not the definition of this concept, the analog 
for second derivatives of the idea in the First Derivative Test is taken as the definition 
of inflection points, which are numbers where the function changes from concave up 
to concave down or vice versa. Moreover, in contrast to the distinction made between 
the two types of local extrema, namely, local maxima and local minima, no distinction 
is made between the type of inflection point where the function changes from concave 
up to concave down, and the type of inflection point where the function changes from 
concave down to concave up. 

The analogy between local extrema and inflection points is, nonetheless, a very 
useful one. To find local extrema, we defined critical points, and then proved that 
those are the only places we need to look for local extrema, though not all critical 
points are local extrema. Similarly, we could define something that might be called 
“second critical points.” That is, a point c is a second critical point if either f is twice 
differentiable at c and f”(c) = 0, or f is not twice differentiable at c. Then the second 


224 4 Differentiation 


critical points would be the only places we need to look for inflection points, though 
not all second critical points are inflection points; the analog of Theorem 4.5.9 would 
then say that a second critical point is an inflection point if and only if f” (x) changes 
from positive to negative or vice versa at the second critical point. No one seems to 
use the term “second critical point,’ though perhaps it would be a nice idea. For the 
sake of leaving room for more important topics, we do not give a rigorous treatment 
of inflection points in this text. 


Exercises 


Exercise 4.6.1. [Used in Lemma 4.6.2.] Prove Lemma 4.6.2 (1) (3). 
Exercise 4.6.2. [Used in Lemma 4.6.2.] Prove Items (b)—(d) of Lemma 4.6.2 (2). 


Exercise 4.6.3. [Used in Exercise 4.6.5, Exercise 5.7.5, Theorem 6.4.12, Lemma 7.3.7 
and Exercise 7.3.5.] Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f: [a,b] — R be a function. Suppose that f is continuous and strictly monotone. 


(1) Prove that if f is strictly increasing, then f((a,b]) = [f(a),f(b)] and the 
function f: [a,b] — [ f(a), f(b)] is bijective; and that if f is strictly decreas- 
ing, then f((a,b]) = [f(b), f(a)] and the function f: [a,b] — [f(b), f(a)] is 
bijective. 

(2) Prove that f~!: f([a,b]) — [a,b] is continuous, and is strictly increasing or 
strictly decreasing if f is strictly increasing or strictly decreasing, respectively. 


Exercise 4.6.4. [Used in Section 4.6.] Let J C R be a non-degenerate open interval, 
and let f: J — R be a function. 


(1) Suppose that f is continuous. Let x, p,g,y € J. Suppose thatx << p<q<y, 
and that f(x) < f(p) and f(q) > f(y). Prove that f is not injective. 
(2) Suppose that f is continuous. Let c,d € J. Suppose that f is differentiable 
at c and d, and that f’(c) > 0 and f’(d) < 0. Prove that f is not injective. 
[Use Exercise 4.5.5.] 
(3) Suppose that f is differentiable and injective. Prove that f’(x) > 0 for all x € J, 
or that f’(x) <0 for all x € J. Deduce that f is monotone. 
(4) Suppose that f is differentiable and injective. Prove that f(/) is a non-degen- 
erate open interval. [Use Exercise 4.5.1.] 


Exercise 4.6.5. [Used in Exercise 7.4.1.] Let (a,b) C R be a non-degenerate open 
bounded interval, and let f: (a,b) — R be a function. Suppose that f is continuous, 
strictly increasing and bounded. Let F': [a,b] — R be defined by 


glb f((a,b)), ifx—a 
F(x) = 4 f(x), ifa<x<b 
lub f((a,b)), ifx=b. 


(1) Prove that F is continuous. 
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(2) Prove that F is strictly increasing. 
(3) Prove that F([a,b]) = [glb f((a,b)), lub f ((a,b))]. [Use Exercise 4.6.3 (1).] 


Exercise 4.6.6. [Used in Theorem 6.3.9 and Theorem 6.4.12.] Let (a,b] C R be a non- 
degenerate half-open interval, and let f: (a,b] — R be a function. Suppose that f is 
strictly increasing. 


(1) Prove that one of the following holds. 
a. If f(/) is bounded below, then f((a,b]) = (glb f((a,b]), f(d)]. 
b. If f(Z) is not bounded below, then f((a,b]) = (—°, f(b)]. 
(2) Suppose that f is continuous at b. Prove that f~!: f((a,b]) — (a,b] is contin- 
uous at f(b). [Use the one-sided analog of Exercise 2.3.8.] 


Exercise 4.6.7. Let 7 C R be an open interval, and let f: J R be a function. Suppose 
that f is concave up. Prove that if a,b,c € anda <b <c, then My» < Mae < Mpc. 


Exercise 4.6.8. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that if r,s € (a,b) and r < s, then Mp < Msp. 
Is f| (a,b) necessarily concave up? Give a proof or a counterexample. 


Exercise 4.6.9. Let / C R be a non-degenerate open interval, and let f: /— Rbea 
function. 


(1) Let [a,b] C I be a non-degenerate closed bounded interval. Prove that if 
the set {Mp | p,g € [a,b] and p < q} is bounded, then f|;,y) is uniformly 


continuous. [Use Exercise 3.4.5.] 
(2) Prove that if f is concave up, then f is continuous. [Use Exercise 3.3.2.] 
(3) Is Part (2) of this exercise true if J is a closed interval? Give a proof or a 
counterexample. 


Exercise 4.6.10. Let J C R be an open interval, and let f: J — R be a function. The 
function f is convex if a,b € J anda < bimply f(ta+(1—t)b) <tf(a)+(1—-1t)f(d) 
for all ¢ € (0, 1]. 

Prove that f is concave up if and only if f is convex. 


4.7 Historical Remarks 


Of the two fundamental topics in calculus, differentiation and integration, the former 
appears to the modern student to be much simpler than the latter, a view that will be 
maintained when the reader encounters the treatment of integration in Chapter 5. As 
such, it is no surprise that in today’s calculus and real analysis texts, differentiation is 
almost always taught before integration (a notable exception being the classic text 
[Apo67]). Historically, however, differentiation has a much less rich history than 
integration, the latter having strong roots in the ancient world, and the former having 
to wait until the 17th century for substantial treatment. 

The late historical development of differentiation in contrast to integration should 
come as no surprise. The basic question that gave rise to the study of integration is the 
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computation of areas of regions of the plane and volumes of regions of space, and such 
questions certainly arose very early in human civilization, for example in architecture, 
farming, art, commerce and the like. By contrast, the notion of the rate of change of a 
function, which is the essence of derivatives, requires the concept of a function, which 
is a much later historical development than the notion of area, with functions as we 
think of them today making their appearance only in very preliminary form in the 
14th century, and in a more developed form in the 18th century. Prior to the invention 
of analytic geometry in the 17th century, curves in the plane were viewed as geometric 
objects, rather than the graphs of functions, and the question of rate of change is not 
nearly as natural a question for curves as it is for graphs of functions. If one views a 
curve as the motion of a particle, then one can consider its velocity vector, but that too 
came relatively late historically. It is possible to consider tangent lines to curves from 
a strictly geometric viewpoint, but it is difficult to come up with a precise geometric 
definition of what tangent lines are. In ancient Greece, and until close to when calculus 
was invented, a tangent line to a curve was viewed as a line that touched the curve in 
one point, or that satisfied some other similar geometric definition. Such an approach 
is not strictly correct, as can be seen by Example 4.2.5 (1), where the function crosses 
its tangent line at x = 0 infinitely many times as 0 is approached. However, it is hard 
to come up with a better definition of tangent lines without the function concept, or 
without thinking of a curve as representing the motion of a particle. 


Ancient World 


Euclid (c.325—c. 265 BCE) discussed tangent lines to circles in Book III of the 
Elements, Archimedes (287-212 BCE) discussed tangent lines to what is now called 
the Archimedean spiral, and Apollonius (c. 262 BC-c. 190 BC) discussed tangent 
lines to conic sections. The ancient Greeks, and the rest of the ancient world, did not 
appear to know much more than that about tangent lines to curves. 


Medieval Period 


Bhaskara II (1114-1185), also known as Bhaskaracharya, appeared to have conceived 
of the basic ideas of differentiation in Siddhanta Siromani of 1150, which was pri- 
marily about astronomy, but also contained some mathematics. He had the idea of 
locating maxima and minima where the derivative is zero, had a version of Rolle’s 
Theorem and had the equivalent of the fact that the derivative of sine is cosine. 

Nicole Oresme (1323-1382) observed, via his graphical representation of vari- 
ation, a special case of the idea of locating maxima and minima where the rate of 
change is zero. 


Seventeenth Century 


The modern study of tangent lines started in the first half of the 17th century, a 
time when mathematics was in general advancing very rapidly. The approach to 
geometry in this period was very different from the impressive but restrictive ancient 
Greek approach; algebra, which was underdeveloped in ancient Greece, had shown 
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considerable development in the meantime; analytic geometry had recently been 
developed, and many new curves were studied, giving further impetus to the need to 
find tangent lines to curves; the idea of a function was starting to take shape (though 
the fully modern approach to functions was yet to be developed). In general, in keeping 
with the spirit of the times, a practical problem-solving approach to mathematics had 
developed by the 17th century. 

Johannes Kepler (1571-1630), in Nova stereometria doliorum vinariorum of 1615, 
wanted to provide practical methods for finding volumes of wine casks. In particular, 
he found maximum volumes using an experimental approach, listing volumes for 
given dimensions, and then selecting the best. In the process, he noticed that as one 
got closer to the maximum volume, the amount that the volume changed for a given 
amount of change in the dimensions of the solid decreased until it is negligible, which 
is essentially a recognition that the maximum is found when the rate of change is 
zero. 

A major step forward in the study of tangent lines, not long before the invention of 
calculus, was due to Pierre de Fermat (1601-1665), who in the late 1620s found the 
maximum and minimum values of curves by considering what we write as Aa eed 
dividing as if e is non-zero, and then dropping e at some point in order to get the 
answer to the problem. Fermat also used this method to find tangent lines to curves. 
Fermat’s argument resembles the use of infinitesimals, though he did not clearly 
explain if that is how he understood it. Moreover, whereas the idea of the derivative is 
implicit in Fermat’s method, he did not appear to have recognized the concept of the 
derivative per se. 

René Descartes (1596-1650), in the appendix La Géométrie of the philosophical 
work Discours de la méthode pour bien conduire sa raison et chercher la vérité 
dans les sciences of 1637, found normal lines (which are perpendicular to tangent 
lines) to some curves by intersecting them with circles, and having the two points of 
intersection get closer and closer, which lead to a double root of an equation when the 
points are thought of as having merged. 

In the 1630s and 1640s Evangelista Torricelli (1608-1647) and Gilles de Roberval 
(1602-1675) helped advance the notion of viewing a curve in the plane as the path 
of a moving object (so that each of x and y is a function of time), which allowed 
for tangent lines to be viewed as lines of instantaneous motion. In the 1650s and 
early 1660s Johann Hudde (1628-1704), René de Sluse (1622-1685) and Christiaan 
Huygens (1629-1695) independently discovered algorithmic rules for computing the 
slopes of the tangent lines of arbitrary algebraic curves. Whereas today we always 
approach tangent line computations via derivatives, because that is certainly the most 
convenient way to do so, the method of Hudde and Sluse was not based upon the 
ideas that eventually became calculus. Calculus is so convenient that we tend to forget 
that some (though not all) of the problems that are now solved with calculus can also 
be solved without it. 

A number of mathematicians, including James Gregory (1638-1675), Isaac Bar- 
row (1630-1677) and Blaise Pascal (1623-1662), used a “differential triangle,’ which 
is an infinitesimal right triangle with hypotenuse that is tangent to the curve, and 
with sides parallel to the coordinate axes. In contrast to the others, who used this 
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triangle for tangent problems, Pascal used it as part of an area problem, but it was 
from Pascal’s work of 1658 that Leibniz learned of this idea, which was important to 
his understanding of the derivative. 

Further progress in the development of the derivative was due to Isaac Barrow 
(1630-1677), who was Newton’s predecessor as the Lucasian Professor of Mathemat- 
ics at Cambridge. In his lectures in the mid- 1660s, which might have been attended 
by Newton, Barrow computed slopes of tangent lines by implicitly using the idea 
of approximating tangent lines with secant lines, and dropping higher powers of 
infinitesimals. 


Newton and Leibniz 


The next step in the development of the derivative was part of the larger invention 
of calculus by Isaac Newton (1643-1727) and Gottfried von Leibniz (1646-1716). 
On the one hand, Newton and Leibniz (independently, and in some ways rather 
complementarily) moved our understanding of the derivative forward substantially. 
On the other hand, Newton and Leibniz invented calculus not in a vacuum but rather 
in the context of the substantial mathematical activity that preceded them. What sets 
Newton and Leibniz apart from their predecessors is that they went beyond solving 
particular tangent or area problems, and recognized that behind the various particular 
cases was a general method for treating these sorts of problems, and they then worked 
out many of the details of this general method. 

The first to conceive of what we now call calculus was Newton, who worked out 
the basics of his version of it in the period 1665-1666. In his first (and unpublished) 
paper on calculus, referred to as the October 1666 Tract on Fluxions, Newton studied 
the tangent problem by thinking of a point moving along a curve given by an equation 
of the form f(x,y) = 0, and considering what he later wrote as x and y (and we write 
as a and a), The derivative as we know it today was y, Newton calculated the 
derivative of any algebraic curve by using infinitesimals and implicit differentiation. 
He essentially worked out the Chain Rule, and showed how to take derivatives of 
products and quotients, though he did not explicitly formulate the Product Rule and the 
Quotient Rule. He also recognized the importance of what we call antidifferentiation. 
In his unpublished Tractatus de methodis serierum et fluxionum of 1671, Newton 
found maxima and minima by setting the derivative equal to zero and solving. 

Leibniz, who worked out his version of calculus in the period 1675-1677, had a 
very different conceptual approach than Newton. Rather than thinking of derivatives in 
terms of a point moving along a curve and the rates of change of its x and y coordinates, 
Leibniz considered infinitesimal changes in x and y, which he denoted dx and dy, and 
thought of the tangent line to a curve as a secant line connecting two infinitesimally 
close points on the curve. The derivative was the ratio of the differentials dy and dx. 
Leibniz worked out the derivative of power functions, as well as the Product Rule 
and Quotient Rule, though he wrote everything in terms of differentials rather than 
derivatives, for example d(xy) = xdy + ydx. He observed that dv is positive when v is 
increasing, and analogously for decreasing, and hence that local extrema occur only 
when dv = 0, and that inflection points occur only when d(dv) = 0. As an application, 


4.7 Historical Remarks—Differentiation 229 


Leibniz gave a derivation of Snell’s Law of Refraction (which was already known at 
the time), just as we would do today in a calculus course. 

The development of calculus is not only about derivatives, and it is not possible 
to see the full range of Newton’s and Leibniz’s accomplishments without seeing 
their contributions to other aspects of calculus as well, especially integration; see 
the historical discussion for other chapters for details. It is not worth dwelling on 
the famous dispute about whether Newton or Leibniz should be given priority in the 
invention of calculus. The modern view is that each formulated his version of calculus 
independently of the other; Newton’s work came first, though Leibniz was the first 
to publish. Newton was the better mathematician of the two—he is viewed by many 
as one of the three greatest mathematicians of all time, together with Archimedes 
and Gauss. On the other hand, Leibniz’s approach to calculus, and in particular his 
notation, had a larger impact on the immediate development of the subject. They both 
deserve to share the credit. 


Eighteenth Century 


Newton and Leibniz were, in keeping with the level of mathematical rigor of their 
era, not overly careful with the use of infinitesimals (though Newton’s views on 
infinitesimals changed over time). A rigorous treatment of the real numbers was not 
developed until the 19th century, and it is only with a rigorous foundation for the real 
numbers that all the theorems of calculus can be proved. Nonetheless, questions about 
the use of infinitesimals were raised much earlier, notably by George Berkeley (1685- 
1753) in his essay The Analyst; or, A Discourse Addressed to an Infidel Mathematician 
of 1734. Berkeley pointed out some philosophical and logical problems with both 
Newton’s and Leibniz’s approach to derivatives. For example, Berkeley correctly 
pointed out the logical problem that occurs when computing Heth) fe) using the 
approach of his day (before the invention of limits), where one first assumes that h is 
non-zero in order to divide by it, and one then subsequently assumes that / is zero in 
order to drop terms containing / from consideration. Of course, even with the dubious 
use of infinitesimals, calculus right from the beginning proved to be very useful, and 
so the response to such criticisms was not to abandon calculus, but rather to find better 
foundations for it. 

One response to Berkeley was by Jean d’ Alembert (1717-1783), in the article 
Différentiel of 1754, which was published in the influential French Encyclopédie, 
of which d’Alembert was an editor. In that article d’Alembert proposed that the 
derivative be viewed as ae a, rather than as Newton’s ratio of fluxions or Leibniz’s 


ratio of differentials. He did not have a rigorous definition of limits, but his approach 
was nonetheless a step forward in the development of the derivative as we now know 
it. 

Joseph-Louis Lagrange (1736-1813), in Théorie des fonctions analytiques of 
1797, attempted to avoid both infinitesimals and limits by viewing all functions as 
power series, and then picking off the derivative as a certain coefficient in such 
series. Lagrange also introduced the term “derivative” (“fonction dérivée” in the 
original) and the notation “f’(x).” It was subsequently shown by Cauchy that not 
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every differentiable function can be written as a power series, and hence Lagrange’s 
approach to avoiding infinitesimals and limits was not satisfactory. 


Nineteenth Century 


It was Augustin Louis Cauchy (1789-1857) who brought the derivative, and much of 
calculus, into the modern form with which we are familiar today. Cauchy’s approach 
to the derivative was influenced by Sylvestre Frangois Lacroix (1765-1843), who 
wrote some calculus textbooks starting in 1797 that were widely used, but which 
Cauchy found not entirely rigorous; both Lacroix and Cauchy taught at the Ecole 
Polytechnique, and wrote their texts to be able to teach calculus in ways they viewed 
as satisfactory. Cauchy’s work on calculus is found in three important textbooks he 
wrote in the 1820s. Cauchy’s predecessors (with the exception of d’ Alembert and 
Lagrange) took the derivative as the starting point of calculus, whereas Cauchy started 
with limits as the fundamental concept, and computed the derivative by the familiar 
formula ton Heth) fle) Cauchy introduced the Chain Rule as we know it, although 
he gave the mistaken proof mentioned right after the statement of Theorem 4.3.3. 
Cauchy was also the first person to give the Mean Value Theorem its now central 
role, although the current approach of proving the Mean Value Theorem from Rolle’s 
Theorem is due later to Pierre Bonnet (1819-1892). 


> 


Integration 


5.1 Introduction 


Having looked at differentiation in Chapter 4, we now turn to the other main part of 
calculus, namely, integration. It is important to understand that, although calculus 
is unified by the Fundamental Theorem of Calculus, each of differentiation and 
integration has its own motivation and technical details, and in principle can be treated 
separately up until the Fundamental Theorem of Calculus, and studied in either order. 

When we refer to integration at this point in the text we mean “definite integration.” 
Indefinite integration is another name for antidifferentiation, and is defined solely in 
terms of differentiation. The “real” integration is definite integration, and the definition 
of this type of integral has nothing to do with differentiation per se. The Fundamental 
Theorem of Calculus, which relates definite integration and differentiation, is an 
amazing and surprising fact, and is not simply a matter of definition. 

In our treatment of integration we will be using the terms “Riemann sum” and 
“Riemann integral.’ As the reader might guess from this terminology, there are other 
kinds of integrals as well, the most well-known being the “Lebesgue integral.” We 
will not treat these other types of integrals, but it is worth knowing that they exist. 
All of these types of integrals agree on continuous functions, but sometimes differ 
on more complicated functions. If we just say “integral,” we mean the Riemann 
integral. See [Str00, Chapter 14] for the Lebesgue integral in R, and see [Bar96] for 
the Henstock—Kurzweil integral (also known as the generalized Riemann integral or 
the gauge integral). 

As was the case for derivatives, it is assumed that the reader has seen integrals 
in a calculus course, and so we will spend little time on intuitive motivation, and not 
discuss applications or computational examples at all. 


5.2 The Riemann Integral 


The geometric motivation for integration is to find the area of curved regions in the 
plane. It is easy to find the area of simple shapes such as rectangles and triangles, 
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and from there it is also possible to find the area of any polygon by cutting it up into 
rectangles and triangles. By contrast, the area of curved regions is much harder to 
compute. 

The simplest type of curved region in the plane is the region under the graph 
of a function, and that is the type of area directly addressed by integration. The 
fundamental idea of the Riemann integral is to approximate the area under the graph 
of a function by approximating the region with rectangles (the areas of which are easy 
to find), then adding up the areas of the rectangles, and finally taking the limit as the 
widths of the rectangles get thinner and thinner. Of course, as with any other type of 
limit, not all limits of this type exist, and that corresponds to when the function is 
not integrable, which means geometrically that the function is so wildly behaved that 
we cannot assign a number in a meaningful way to the area of the region beneath the 
graph of the function. 

There are a variety of ways that the region under the graph of a function might 
be approximated with rectangles. Two standard ways are to use right-hand sums and 
left-hand sums, as seen respectively in Figure 5.2.1 (i) (ii). Suppose that f: A — R is 
a function, where A C R is a set, and we want to find the area under the graph of f 
and above an interval [a,b] C A. To compute a right-hand sum, we divide the interval 
[a,b] into smaller subintervals (in the figure there are seven such subintervals), and 
we then form a rectangle above each subinterval, where the height of the rectangle 
equals the value of the function f at the right endpoint of the subinterval; we then 
add up the areas of the rectangles, to obtain an approximation of the area under the 
graph of the function. Left-hand sums are similar to right-hand sums with the obvious 
modification. Intuitively, to find the exact area under the graph of the function, we 
then need to take some sort of limit as the subintervals get smaller and smaller. 


(i) (ii) 


Fig. 5.2.1. 


The Riemann integral is based upon a generalization of right-hand sums and 
left-hand sums. Although it is convenient for computational purposes to subdivide the 
interval [a,b] into subintervals of equal length, and to use something uniform such as 
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right endpoints or left endpoints, from a theoretical point of view we want to make 
sure that we do not miss any bad behavior of the function, and so we need to look 
at sums of rectangles where the subintervals are not necessarily of equal length, and 
where the height of each rectangle is the value of the function at some point in the 
corresponding subinterval, but not necessarily endpoints, and not necessarily in the 
“same location” in each subinterval. A more general sum of this sort is illustrated 
in Figure 5.2.2. The notation used in this figure will be explained in the following 
definition, after which we will give the definition of these more general sums. 


Definition 5.2.1. Let [a,b] C R be a non-degenerate closed bounded interval. 


1. A partition of [a,b] is a set P = {x0,x1,...,X)} such that a= xp <x) <-+-+ < 
X, = b, for some n € N. 

2. If P = {x0,%1,--.,%n} is a partition of [a,b], the norm (also called the mesh) 
of P, denoted ||P||, is defined by 


||P|| = max{x1 —x0,%2 —%1,---;Xn —Xn—1}. 


3. If P = {xo,%1,...,Xn} is a partition of [a,b], a representative set of P is a set 
T = {t,t2,...,tn} such that t; € [xj-1,x;] for alli € {1,...,}. A 


We note that the definition of representative sets in Definition 5.2.1, while in- 
tuitively correct, has a slight technical problem. It could happen that t;_; = ¢; for 
some i € {1,...,n}, in which case writing “T = {t1,f2,...,t,}” would lead to the 
set T having fewer than n elements, because a single element is never written twice 
in a set, and yet we want T to have one element for each of the n subintervals of 
(a, b|. Therefore, the technically correct way to define a representative set would be 
as a function T: {1,...,2} — [a,b] such that T(i) € [x;-1,x;] for all i € {1,...,n}. 
However, because no problem will arise, we will use the more convenient notation 
of Definition 5.2.1 (3) and write a representative set as T = {t1,to,...,tn}, where we 
think of t; = T(i) for alli € {1,...,n}. 
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The term “partition” as used here does not have the same meaning as the term 
“partition” that is used in the context of equivalence relations (as discussed in [Blo10, 
Section 5.3]); both uses of the term “partition” are quite standard, and the proper 
meaning is easy to tell from the context. The term “representative set,’ by contrast, 
is not standard, but there does not appear to be a universally accepted term for this 
concept (some books do not even give this concept a name). 


Definition 5.2.2. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function, let P = {xo,x1,...,x,} be a partition of [a,b] and let 
T = {t1,to,...,t,} be a representative set of P. The Riemann sum of f with respect 
to P and T, denoted S(f,P,T), is defined by 


S(f,P,T) = Esrw (x; —i-1). A 


Example 5.2.3. 


@) ok [0,2] — R be defined by f(x) =x? for all x € [0,2]. Letn € N. Let P, = 
{0, 2 mi mony 7A}. Then P, is a partition of [0,2], and ||P,|| = “ Let T, = {2, 4. : 
Then T,, is a representative set of P,. The Riemann sum S(f, ’P, T,) is an example of a 


right-hand sum. Using Proposition 2.5.2 we see that 
afd" 2 
S(fi:Pitn) = Ls sae rs 
mits 2 


n(n+1)(2n+1) — 4(n+1)(2n+ 1) 
= LP= io 6 7 3n? 


We were able to compute an explicit formula for S(f,P,,T,) in terms of n only 
because the function f was so simple that we had the convenient formula from Propo- 
sition 2.5.2 available; for more complicated functions it is rarely possible to find such 
explicit formulas for Riemann sums. 

(2) Let r: [0,1] — R be defined by 


1, ifxeEQN|(0,1] 
r(x) = 
0, otherwise. 
Let P = {x0,%1,---,X,} be a partition of [0,1]. By Theorem 2.6.13 (1) it is possible 


to choose a representative set T = {t),f2,...,t,} of P such that t1,f2,...,t, are all 
rational numbers. Then 


S(r,P,T) = Frys Xj —Xj-1) =Yi-( Xj —Xj-1) =X —X0 = 1. 


On the other hand, by Theorem 2.6.13 (2) it is possible to choose a representative set 


S = {s1,82,..-,5n} of P such that s1,52,...,5, are all irrational numbers. It is then seen 
that S(r, P,S) = 0. It is also possible to choose a representative set U = {uj ,u2,...,Un} 
of P such that some of u1,U2,...,U, are rational and some are irrational, in which 


case 0 < S(r,P,U) < 1. We therefore see that, at least for some functions, the choice 
of representative set can make a big difference when computing Riemann sums. 4 
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We now turn to the definition of the Riemann integral. The intuitive idea is that 
the integral exists if there is some real number that all Riemann sums get closer and 
closer to, as the widths of the rectangles in the Riemann sums get thinner and thinner. 
The Riemann integral is a type of limit, and similarly to the definition of limits of 
functions in Definition 3.2.1, we use an €—6-type approach for the definition of the 
Riemann integral. If the e—6 formulation of the Riemann integral appears somewhat 
more complicated than the ¢—6 formulation of limits of functions, that is because 
for integrals we need to take into account all possible partitions, and all possible 
representative sets of each partition. 


Definition 5.2.4. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let K € R. The number K is the Riemann integral 
of f, written 


[ sear=K, 


if for each € > 0, there is some 6 > 0 such that if P is a partition of [a,b] with ||P|| <6, 
and if T is a representative set of P, then |S(f,P,7) — K| < €. If the Riemann integral 
of f exists, we say that f is Riemann integrable. A 


It is important to stress that in Definition 5.2.4, when it says “if T is a representative 
set of P,’ that means that T can be any representative set of P; it is not sufficient 
to show that if P is a partition of [a,b] with ||P|| < 6, then |S(f,P,T)—K|<e 
for some choice of representative set T. We will see the importance of this fact in 
Example 5.2.6 (3). 

The definition of the Riemann integral of a function f makes an implicit assump- 
tion, which is that if f is Riemann integrable, then the number K in the definition 
of Riemann integrability is unique; if that were not the case, then there would not 
be a single number that would be called “the Riemann integral” of the function on 
the given interval. Fortunately, as we see in the following lemma, this assumption of 
uniqueness is justified. 


Lemma 5.2.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. If f is Riemann integrable, then there is a unique K € R 
such that L? fo dx = K. 


Proof. Suppose that f is Riemann integrable. Suppose further that e f(x) dx= ki 
and f? f(x) dx = Kp for some K,,K2 € R such that Ki 4 Ko. Let € = oes Then 
€ > 0. There is some 6; > 0 such that if P is a partition of [a,b] with ||P|| < 6), and 
if T is a representative set of P, then |S(f,P,7) — Ki| < €, and there is some &) > 0 
such that if Q is a partition of [a,b] with ||Q|| < 65, and if S is a representative set of 
Q, then |S(f,Q,S) — K2| < €. Let 6 = min{6,, 5}. Let R be a partition of [a,b] with 
||R|| < 6, which exists by Exercise 5.2.1, and let V be a representative set of R. Then 


|Kz — Ki| = |Ky —S(f,R,V) +S(f,R,V) — Ki | 
< |K2 —S(f,R,V)|+|S(f,8,V) — Ki <E€TeE= |K2 — Ki |, 
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which is a contradiction. Hence, if f is Riemann integrable, then there is a unique 
K €R such that L? f(~) dx=K 


We will usually drop the word “Riemann” and just say “integrable” and “integral,” 
because we will not be dealing with any other type of integral in this text. Also, 
although we will not be using the term “definite integral” in this text, we will always 
mean “definite integral” when we say “integral,” unless it is clear from the context 
that we mean “indefinite integral,” a term that will be defined in Section 5.6. 

Although we use the standard notation ¢ f (x) dx to denote the Riemann integral, 
the reader might well ask why we write “f(x),” given that the name of the function is 
actually “f’ (recall the brief discussion of this matter following Definition 2.5.10), 
and why we have the “dx” at all. Those would be very sensible questions. In fact, 
there are texts that simply write pe f to denote the Riemann integral of the function f 
on the interval [a,b], and doing so is quite reasonable. We use the traditional notation 
ie » #(x) dx simply for the sake of familiarity with what the reader has already seen 
in calculus (and other) courses. It is important to note, however, that the “x” in this 
notation is a “dummy variable,” and that it has no intrinsic meaning. We could just as 
well write ig f(y) dy to mean the same integral. The standard notation for integrals 
that we use, due to Gottfried von Leibniz (1646-1716), is meant to remind us of 
Riemann sums, which have the form °?_, f(t;)(x; —x;-1). The symbol “f” is simply 
an elongated letter “S,’ and it stands for sum, as does the Greek letter “)””’; the “dx” is 
meant to remind us of (x; — x;-1), which is sometimes abbreviated “Ax;,” though we 
will not be using that notation. In practive, the “dx” is convenient because it helps us 
keep track of things when we do substitution. 

In most cases, it is tricky (or virtually impossible) to show that a function is 
integrable using only the definition of integrability. Nonetheless, we now see a few 
examples where the definition of integrability can be used directly. 


Example 5.2.6. 


(1) Let cE R, and let f: [a,b] — R be defined by f(x) =c for all x € [a,b]. We 
will show that f is integrable, and that ci f (x) dx = c(b—a). Let P = {xo,1,..-,Xn} 
be a partition of [a,b], and let T = {t1,t2,...,t,} be a representative set of P. Then 


S(f,P,T) =r ti) (x; —xi-1) = ce) (xj —xi-1) = C(X%, —x0) = c(b—a). 


i=1 


Given that all Riemann sums of this function have the same value, it is evident that f 
is integrable and that lige dx =c(b—a). 
(2) Let g: [0,1] — R be defined by 


« {7 itr=0 
Kk) = 
: 0, ifxe (0,1). 


We will show that g is integrable, and that G g(a)dx= 0. Lete > 0. Let 6 = 4. Let 
P = {xo,X1,---,4n} be a partition of [0,1] with ||P|| <6, and let T = {t1,f2,...,tn} be 
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a representative set of P. Then g(t,) might equal 7 or it might equal 0, which means 
\g(t1)| <7, and g(t;) =0 fori € {2,3,...,n}. Therefore 


IS(g,P,7) —0| = Y altsi—s-1) ee | 


= |g(t)|- |x1 —x0| < 76 =e. 


It follows that g is integrable and that 1 g(x) dx =0. 
(3) Let r: [0,1] — R be defined by 


ie fi ifx € QN(0, 1] 


0, otherwise. 


It was shown in Example 3.3.3 (6) that r is discontinuous everywhere. We will now 
show that this function is not integrable. Suppose to the contrary that r is integrable. 
Let € = 5. We will obtain a contradiction by showing that there is no 6 that “works” 
for this €. Let 6 > 0. Let P = {x0,x1,..-,Xn} be a partition of [0,1] with ||P|| < 6. 
There are now two cases. First, suppose that G r(x)dx < 5 By Theorem 2.6.13 (1) 
we can choose a representative set T = {t1,fo,...,t,} of P such that t),f,...,t, are 
all rational numbers. It was seen in Example 5.2.3 (2) that S(r,P,T) = 1. Hence 


1 1 
s(nP7)— | rxjdx>1-5=5 
0 


which implies that 


1 
a 


sivr)— [ rts)ax 5 


Second, suppose that i, r(x) dx > 5. By Theorem 2.6.13 (2) we can choose a repre- 
sentative set T = {t),f,...,t,} of P such that t),f,...,f, are all irrational numbers. 
It was seen in Example 5.2.3 (2) that S(r,P,T) = 0. Hence 


a r(x) dx 


We have therefore shown that for any 6 > 0, there is some partition P of [0,1], and 
some representative set T of P, such that 


1 
iver) [ r(x)dx) = > = 
0 


1 
ivr) f r(x)dx| £ €, 


which is a contradiction to our assumption that r is integrable. 
(4) Let s: [0,1] — R be defined by 


a if x€ QN [0,1] and x = . in lowest terms, 


s(x) = where p € NU {0} andg EN 
0, otherwise. 
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The use of the expression “lowest terms” was discussed in Example 3.3.3 (7), where it 
was shown that s is discontinuous at every rational number in [0, 1], but it is continuous 
at every irrational number in {0, 1]. It is not entirely evident intuitively whether or not 
the function s is integrable, in that it has more discontinuities than the function g in 
Part (2) of this example, but fewer discontinuities than the function r in Part (3) of 
this example. It turns out that s is integrable and that ¢ s(x) dx = 0, though we need 
a slightly trickier argument than for any of the previous parts of this example. 

Let € > 0. We will choose our 6 soon, but first we need a preliminary step. By 
Corollary 2.6.8 (1) there is some go € N such that go > max{2,2}. Then 7 <5 Let 


Then x € A if and only if x € QN (0, 1] and x = ae in lowest terms, where p € NU {0} 
and q € N and q < qo. Because 1 € A we know that A # @, and it is seen that A is 
finite. Let M Be the number of elements in A. Then M > 1. 

Let 6 = 34. Then 6 > 0. Let P = {xo,x1,...,%n} be a partition of [0,1] with 
|P|| < 6, and let T = {ty,fo,...,t,} bea representative set of P. Leti€ {1,...,n}. If 
t;  Q then s(t;) = 0; if t; € A then in < s(t;) < 1; andift; € Q- A then 0 < s(t jy<- a . 
Using Exercise 2.5.3, we see that 


of =|P stay as, 


n 
IS(s,P,T) - < Y|s(6)|-b—a-1| 


i€{1,...,n} i€{1,...,.n} 
t;¢Q tie 
+ YF [sti] [ei —ai-1| 
i€{1,...,n} 
t;}€<Q-A 
< y 0- |x; —xj-1]+ y os 
i€{1,...,n} i€{1,..., 
EQ) ae 
+ y — |x; -—xj-1| 
i€{1,...,n} 90 
t;€Q-A 


1 

0+M-1-64+—- 1=—3i M.: P= Ki 

<O+ ba bie xji-1] < a +5 Le Xi-1) 
€ € 


=a t 5° On Xo) =€. 


(5) Let v: [0,1] — R be defined by 


0, ifx=0 
v(x) = 
1, ifx€ (0,1). 


The function v is integrable, as the reader is asked to show in Exercise 5.2.3. Let 
s: [0,1] > R be the function given in Part (4) of this example; we saw that s is 
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integrable. By abuse of notation we can view s as a function s: [0,1] — [0,1], and 
hence we can form the composition vos. Then 


peawa 1, ifxe @N (0, 1] 
0, otherwise. 

Hence vos = r, where r is the function given in Part (3) of this example. It was shown 

that r is not integrable, and we therefore see that the composition of two integrable 

functions can be a non-integrable function. This situation contrasts with the fact that 

the composition of continuous functions is continuous (Theorem 3.3.8 (3)), and the 

composition of differentiable functions is differentiable (Corollary 4.3.4). o) 


There are two basic questions that arise about integrals: which functions are 
integrable, and how do we integrate the integrable ones? Over the course of this chapter 
we will deal with the first question thoroughly and the second question partially. For 
derivatives, the intuitive picture is quite clear—a function is differentiable intuitively 
if its graph has no “corners.” It is much harder to get an intuitive picture of integrable 
functions, because, as we saw in Example 5.2.6 (4), a function can be rather strange, 
and in particular it can be discontinuous at many points, and yet still be integrable. 
The precise nature of “how badly discontinuous” a function can be and yet still be 
integrable will be clarified in Section 5.8. As for computing integrals of integrable 
functions, again the situation is much more complicated than for derivatives. Not 
all rules for computing derivatives have exact analogs for integrals, for example the 
Product Rule, the Quotient Rule and the Chain Rule. In principle, as we will see 
in Theorem 5.4.11, any continuous function defined on a closed bounded interval 
is integrable. However, there is no guarantee that such an integral can be computed 
in practice. Some such integrals are very hard to compute, and others impossible to 
compute exactly, and only numerical approximations can be obtained; see Section 5.6, 
after the statement of Corollary 5.6.3, for some references. 


Reflections 


It is customary in most real analysis texts to have the chapter on integrals (by 
which we mean “definite integrals”) follow the chapter on derivatives, for the simple 
reason that integrals are harder to define, and harder to prove theorems about, than 
derivatives. However, because the Fundamental Theorem of Calculus allows us to 
evaluate integrals by using antiderivatives, and because of the similarity of notation 
between definite integrals and indefinite integrals (which are simply antiderivatives), 
some beginning students mistakenly think that the definition of integrals is related 
to the definition of derivatives, which is most certainly not the case. In fact, the 
well-known calculus textbook [Apo67] treats integrals before derivatives, both for 
historical reasons and to clarify the relation between derivatives and integrals. 

There are two standard ways of defining the Riemann integral that are found in 
introductory textbooks on real analysis; one approach uses Riemann sums, and the 
other uses upper integrals and lower integrals. These two methods are completely 
equivalent, and the choice of method used in any text is simply a decision about what 
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to take as the definition, and what to prove using the chosen definition. We use the 
Riemann sum approach in this text because that approach will be familiar to the reader 
from introductory calculus courses, where the informal discussion is always based 
upon Riemann sums. The definition of upper integrals and lower integrals is found in 
Section 5.4, and the proof that the Riemann sum approach is equivalent to the upper 
integral and lower integral approach is given in Theorem 5.4.10. We will use upper 
integrals and lower integrals in the rigorous treatment of area in Section 5.9. 

The definition of the Riemann integral in terms of Riemann sums uses an € and a 
6 in a way that is reminiscent of the ¢—6 definition of limits of functions (given in 
Definition 3.2.1). In spite of this resemblance, however, the definition of the Riemann 
integral is not, strictly speaking, the limit of a function, in contrast to the definition 
of the derivative, which is precisely the limit of a function (and hence the definition 
of the derivative avoids direct mention of €—d, because that is subsumed in the use 
of limits). The reason that the definition of the Riemann integral is not the limit of 
a function is that we need to take into account all possible partitions of the given 
closed bounded interval, and all possible representative sets of each partition, and 
the collection of all partitions and representative sets are not contained in the real 
numbers, and more importantly cannot be arranged in a linear order, in contrast to 
the real numbers. It would be nice if there were some way to generalize the notion 
of limits of functions so that the generalized notion contains as special cases both 
limits of functions and the type of €—6 construction used in the definition of integrals, 
and in fact there is a way of doing that, using the idea of directed sets. See [Bea97] 
for an introductory treatment of real analysis using this approach. The advantage of 
using limits defined in the general context of directed sets is that a number of different 
definitions, for example limits of functions and Riemann integrals, are special cases of 
one general type of limit, and various analogous theorems that are proved separately 
in the standard approach can be proved only once in the context of the more general 
type of limit. The disadvantages of using directed sets are that doing so makes it 
more difficult to develop an intuitive understanding of limits of functions, Riemann 
integrals and the like, and that because students do not have to see similar definitions 
and proofs in different contexts, they are deprived of the chance to have their newly 
acquired skills at ¢—6 proofs reinforced by constant practice. 


Exercises 


Exercise 5.2.1. [Used throughout.] Let [a,b] C R be a non-degenerate closed bounded 
interval, and let € > 0. Prove that there is a partition R of [a,b] such that ||R|| < e. 


Exercise 5.2.2. [Used in Theorem 10.2.11.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f,g: [a,b] + R be functions. Suppose that there is some 
M €N such that | f(x) — g(x)| < M for all x € [a,b]. Let P be a partition of [a,b], and 
let T be a representative set of P. Prove that |S(f,P,T) — S(g,P,T)| <M(b—a). 


Exercise 5.2.3. [Used in Example 5.2.6.] Let v: [0,1] — R be the function defined in 
Example 5.2.6 (5). Prove that v is integrable, using only the definition of integrability. 
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Exercise 5.2.4. Let f: [0,3] — R be defined by 


_§5, ifxe [0,1] 
e={5 if x € (1,3). 


Using only the definition of integrability, prove that f is integrable. 


Exercise 5.2.5. [Used in Example 5.6.1 and Example 5.6.5.] Let 4: [0,2] — R be 
defined by 
if 1 
ies, PE [0,1] 
2, ifxe (1,2). 


Using only the definition of integrability, prove that h is integrable. 


Exercise 5.2.6. [Used in Example 5.5.2 and Exercise 5.5.1.] Let [a,b] C R be a non- 

degenerate closed bounded interval, and let f: [a,b] — R be defined by f(x) = x for 

all x € [a,b]. Using only the definition of integrability, prove that f is integrable and 
b2— 2 


f? f(x)dx = 4“. Use the fact that 


yey ) b?— a2 
Xj — Xji-1) = ’ 
2 : 5 


i=l 
which you should verify. 


Exercise 5.2.7. Given an example of a function f: [0,1] — R such that f is not 
integrable, but that |f| is integrable. 


Exercise 5.2.8. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f,g,h: [a,b] — R be functions. Suppose that f and / are integrable, and that 
f? f(x) dx = [? h(x) dx. Prove that if f(x) < g(x) < A(x) for all x € [a,b], then g is 
integrable and f? g(x)dx= PI@ dx. 


Exercise 5.2.9. Let [a,b] C R be a non-degenerate closed bounded interval, let 
c € Rand let f: [a,b] — R be a function. Let h: [a+c,b+c] — R be defined by 
h(x) = f(x—c) for all x € [a+c,b+c]. Prove that h is integrable if and only if f is 
integrable, and if they are integrable then [ sais h(x) dx = fi f(x) dx. 


Exercise 5.2.10. [Used in Exercise 5.5.4.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Let g: [—b, —a] — R be defined 
by g(x) = f(—x) for all x € [—b,—a]. Prove that g is integrable if and only if f is 
integrable, and if they are integrable then [ g(x) dx = c f(x) dx. 


Exercise 5.2.11. [Used in Exercise 5.3.6, Exercise 5.3.7 and Exercise 5.5.11.] Let 
[a,b] C R be a non-degenerate closed bounded interval, and let f,g: [a,b] — R 
be functions. Suppose that g is increasing. If P = {xo,x1,...,X,} is a partition of [a, b}, 
and if T = {t),f2,...,t,} is a representative set of P, the Riemann-Stieltjes sum of 
f with respect to P, T and g, denoted S(f,P,T,g), is defined by 
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S(f,P,T,g) = Erne xi) — g(%i-1))- 


Let K € R. The number K is the Riemann-Stieltjes integral of { with respect to g, 
written 


[ seoae=K 


(also written fis f(x) dg(x) = kK), if for each € > 0, there is some 6 > 0 such that 
if P is a partition of [a,b] with ||P|| < 6, and if T is a representative set of P, then 
|S(f,P,T,g) —K| < €. If the Riemann-Stieltjes integral of f exists, we say that f is 
Riemann-Stieltjes integrable with respect to g. 


(1) Let c € (a,b). Suppose that g is defined by 


_ J0, ifxe [a,c) 
coy={t if.x € [c,b]. 


Prove that if f is continuous at c, then f is Riemann-Stieltjes integrable with 
respect to g. 

(2) Let c € (a,b), and let g be as in Part (1) of this exercise. Let f = g. Prove that 
f is not Riemann-Stieltjes integrable with respect to g. 


See Exercise 5.5.11 for a clarification of the relation of the Riemann integral and 
the Riemann-Stieltjes integral, see [Sto01, Section 6.5] for a general discussion of 
the Riemann-Stieltjes integral and see [Ros80] for a rethinking of the definition of 
this type of integral. 


5.3 Elementary Properties of the Riemann Integral 


Having defined the concept of the integral in Section 5.1, we now turn to some 
elementary properties of integrals that follow directly from the definition. We will see 
some additional properties of integrals in Section 5.5, after we look more closely at 
the meaning of integrability in Section 5.4, which will provide us with a powerful 
tool to prove some results about integrals that would be very hard to prove directly 
from the definition of integration. 


Theorem 5.3.1. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f,g: [a,b] > R be functions and let k € R. Suppose that f and g are integrable. 


1. f +g is integrable and Ie Lf +g] (x) dx = f? f(x)dx+ f? g(x) dx 
2. f —g is integrable and [? [f — g|(x) dx = f? F(x)dx — J? g(x) dx 
3. kf is integrable and flkfl (x) dx = kf? f(x) dx 

4. [ekdx=k(b—a). 
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Proof. We prove Part (1); Part (2) is very similar to Part (1), and we omit the details. 
Part (3) is left to the reader in Exercise 5.3.1. Part (4) was proved in Example 5.2.6 (1), 
and is stated here simply for ease of reference. 


(1) Let € > 0. There is some 6; > 0 such that if P is a partition of [a,b] with 
||P|| < 6,, and if 7 is a representative set of P, then |S(f,P,T) — [? f(x) dx| < 5, and 
there is some 6) > 0 such that if Q is a partition of [a,b] with ||Q|| < 5), and if W isa 
representative set of Q, then |S(g,Q,W) — f? g(x) dx| < §. Let 6 = min{6,, 65}. 

Let R = {xo,x1,...,%} be a partition of [a,b] with ||R|| < 6, and let V = 
{v1,v2,---;Vn} be a representative set of P. Then 


n n 


S(f+8RV) = DV Uft+sl)i—x-1) = VFO) +8@l 1-7-1) 


i=1 i=1 


Hence 


sif.Rv)— [sa)ae+s(e,Rv)— | 9la)ax 


b 


< +|Sle.R,v) - g(x) dx 
a 


ss.Rv)— [pleas 


<p fi =e 
a 


We note that whereas we wrote a [f + g|(x) dx’ in Theorem 5.3.1 (1), it is quite 
common to write aye [f (x) + g(x)] dx” to mean the same thing. The latter notation is 
not entirely proper, because the name of the function being integrated is “f+ g,’ and 
not “f(x) + g(x),” but fortunately this notation causes no harm. 

The reader will have noticed that missing from Theorem 5.3.1 is a statement 
concerning the integrability of the product or quotient of integrable functions. As 
we will see in Section 5.5, it is true that the product and quotient of integrable 
functions are integrable (under suitable hypotheses for quotients), but the proof of 
that fact requires more tools than we presently have at our disposal. The proof of the 
integrability of the sum of integrable functions was simple because of the formula 
S(f+g,R,V) = S(f,R,V) +S(g,R,V), but there is no comparable formula for the 
Riemann sum of a product or quotient of functions. Correspondingly, although we will 
prove in Section 5.5 that products and quotients of appropriate integrable functions are 
integrable, there are no nice formulas for the integrals of such products or quotients. 

We now turn to some useful results concerning integrals and inequalities. The 
idea behind the third part of the following theorem is illustrated in Figure 5.3.1; the 
area under the curve is greater than the area of the more heavily shaded rectangle, 
which is m(b — a), and is less than the areas of the two shaded rectangles together, 
which is M(b— a). 
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Fig. 5.3.1. 


Theorem 5.3.2. Let [a,b] C R be a closed bounded interval, and let f ,g: [a,b] > R 
be functions. Suppose that f and g are integrable. 


1. If f(x) > 0 for all x € [a,b], then J? f(x) dx > 0. 

2. If f(x) > g(x) for all x € [a,b], then egies dx > f? g(x) dx. 

3. Letm,M ER. Ifm < f(x) for all x € [a,b], then m(b—a) < f? f(x)dx, and 
if f(x) <M for all x € {a,b}, then f? f(x) dx < M(b—a). 


Proof. We will prove Part (1); each of the other parts will then follow from the 
previous part together with Theorem 5.3.1. 


(1) Suppose that f(x) > 0 for all x € [a,b]. Let € > 0. Then there is some 6 > 0 
such that if P is a partition of [a,b] with ||P|| < 6, and if T is a representative set 
of P, then |S(f,P,T) — [? f(x) dx| < €. Let R be a partition of [a,b] with ||R|| < 6, 
and let V be a representative set of R. Then |S(f,R,V) — [? f(x) dx| < €, which 
implies that f? F(x) dx > S(f,R,V) —€. Because f(x) > 0 for all x € [a,b], then 
clearly S(f,R,V) > 0. Hence L? fo dx > —€. By Lemma 2.3.10 (2) we deduce that 
Pi@axes. 


Recall the notion of a function being bounded, as defined in Definition 3.2.5. 
Is there a relation between integrability and boundedness? The function in Exam- 
ple 5.2.6 (3) is bounded but not integrable, so boundedness alone does not imply 
integrability. Does integrability imply boundedness? That is, must a function be 
bounded to be integrable? The reader who is familiar with the concept of an “im- 
proper integral,” as studied informally in many calculus courses, might suppose that 
the answer to this question is no, due to previously encountered examples of improper 
integrals. For example, the improper integral ts 2 dx equals 2, as will be seen in 


Example 6.4.8 (1), even though the function being integrated is not bounded. We 
will discuss improper integrals in detail in Section 6.4, but for now we clarify that an 
improper integral is not evaluated directly as a Riemann integral on a closed bounded 
interval, but rather is evaluated as the limit of Riemann integrals on closed bounded 
subintervals of the original interval. For example, the improper integral % a dx is 


5.3 Elementary Properties of the Riemann Integral 245 


evaluated as a limit of integrals of the form [, in a dx, where s € (0, 1], and the function 


being integrated is bounded when restricted to each of the subintervals [s, 1]. As seen 
in the following theorem, if we restrict our attention to actual Riemann integrals on 
closed bounded intervals, then integrable functions are always bounded. 


Theorem 5.3.3. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. If f is integrable, then f is bounded. 


Proof. Suppose that f is integrable. Then there is some 6 > 0 such that if P is a 
partition of [a,b] with ||P|| <6, and if T is a representative set of P, then |S(f,P,7) — 
Ja f(x) dx| < 9. 

Let Q = {xo,X1,.-.,Xq} be a partition of [a,b] with ||Q|| < 6, which exists by 
Exercise 5.2.1. Let V = {x1,x2,...,X,}, which is the “right-hand” representative set 
of Q. Let 


m= max fs) + : volta) +# ——— I. 


x1 —X0 Xq—Xq-1 


Let x € {a,b]. Then there is some k € {1,...,q} such that x € [xy_1,x,]. The 
number & will not be unique if x happens to be one of x1,x2,...,Xg—1, but in that 
case we choose one of the values of k that works. Let W = {51,52,...,5n} be the 
representative set of Q defined by s; = x; if iA k, and s, =x. Then V and W differ in 
at most one place, namely, at x, and s,. Hence 


IS(f,Q,W) —S(f,0,V)| = [PF (™) — Fe) ] ke — xe—-1)] = |F(%) — Fe) |e — ¥e-1)- 


On the other hand, we see that 
b b 
si..M)-s.av)l=|siram)- [rayar+ [ rorax—sty.0.v) 


+ 


<|s(r..w)— [ ” Fx)dx 


[s@ ax—S(.0.) 


It follows that 
|F(%) — Fx) |e —>e-1) <1, 
and hence 


F(x) — Fe) < 


Using Lemma 2.3.9 (7) we deduce that 


1 
lf (x)| < fae) + ——— <M. 
k ~Xk-1 


Hence f is bounded. 
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Because of Theorem 5.3.3, there will be no loss of generality in restricting our 
attention to bounded functions from now on in our study of integration. The assump- 
tion of boundedness will allow for some very useful technicalities that we will see in 
Section 5.4. 


Reflections 


Given that the definition of the Riemann integral is a bit tricky, what is striking 
about the present section is how straightforward, relatively speaking, the proofs are. 
Of course, in mathematics straightforward does not necessarily mean easy; in this 
context it means that the proofs follow from the definition without any additional 
concepts or devious tricks. The reader should not, however, be fooled into thinking 
that all proofs involving integrals are straightforward. Indeed, the reason this section 
is so short is that we have included only those theorems about integrals that have 
straightforward proofs, and there are not very many of those. This section should be 
viewed as a warm up, with the main action involving integrals about to start in the 
next section. 


Exercises 


Exercise 5.3.1. [Used in Theorem 5.3.1.] Prove Theorem 5.3.1 (3). 

Exercise 5.3.2. [Used in Exercise 5.5.6.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Suppose that f is integrable. 
Prove that if |f(x)| < M for all x € [a,b], for some M € R, then f? f(x) dx| < 
M(b—a). 


Exercise 5.3.3. [Used in Exercise 5.3.4, Example 5.5.2, Section 6.4, Exercise 6.4.3 and 
Example 10.2.4.] Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f,g: [a,b] — R be functions. 


(1) Suppose that f is zero except at one point. Prove that f is integrable and that 
(ide. 

(2) Suppose that f is zero except at finitely many points. Prove that f is integrable 
and that [? f(x)dx=0. 

(3) Suppose that f and g are equal except at finitely many points. Prove that 
f is integrable if and only if g is integrable, and if they are integrable then 


l l 
Li@jds=( e@\ax 
Exercise 5.3.4. [Used in Exercise 5.5.7 and Theorem 5.8.5.] Let [a,b] C R be a non- 
degenerate closed bounded interval, and let f: [a,b] — R be a function. 


(1) Suppose that there is some non-degenerate closed bounded interval [c,d] C 
[a,b], and some k € R, such that f(x) =k if x € (c,d) and f(x) =Oifxe€ 
[a,b] — [c,d]. It does not matter what values f has at x = c and x = d. Prove 
that f is integrable and that [? f(x) dx =k(d—c). 
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(2) The function f is a step function if there is a partition P = {x,x1,...,%n} 
of [a,b], and numbers k1,...,kn € R, such that for each i € {1,...,n}, if 
x € (xj-1,x;) then f(x) = k;. It does not matter what values f has at the 
elements of P. See Figure 5.3.2. Prove that if f is a step function, then f is 
integrable and [? f(x) dx = YL", ki(x; —xi-1). [Use Exercise 5.3.3 (3).] 


Fig. 5.3.2. 


Exercise 5.3.5. [Used in Exercise 5.4.8.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Suppose that there is some 
M > 0 and some 6 > 0 such that if P and Q are partitions of [a,b] with ||P|| < 6 and 
||OQ|| < 6, and if T is a representative set of P, and V is a representative set of Q, then 
|S(f,P,T) —S(f,Q,V)| <M. Prove that f is bounded. 


Exercise 5.3.6. This exercise makes use of Exercise 5.2.11. 


(1) State and prove the analog of Theorem 5.3.1 (1) for Riemann-—Stieltjes inte- 
grals. 

(2) State and prove the analog of Theorem 5.3.1 (4) for Riemann-—Stieltjes inte- 
grals. 


Exercise 5.3.7. This exercise makes use of Exercise 5.2.11. 


(1) Let [a,b] C R be a non-degenerate closed bounded interval, and let f,g : 
[a,b] — R be functions. Suppose that g is strictly increasing, and that f is 
Riemann-Stieltjes integrable with respect to g. Prove that f is bounded. 

(2) Give examples of functions f,g: [0,1] — R such that g is increasing, that f is 
Riemann-Stieltjes integrable with respect to g and that f is not bounded. 


5.4 Upper Sums and Lower Sums 


Although the definition of the Riemann integral via Riemann sums is intuitively 
appealing in that it corresponds to the way integrals are treated in calculus courses, 
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from a technical point of view Riemann sums are not always easy to work with. In 
particular, there are some useful properties of integrals, such as those that will be 
proved in Section 5.5, that would be quite difficult to prove directly using the definition 
of the Riemann integral. Fortunately, these results can be proved more easily using 
an alternative characterization of integrability that is given in Theorem 5.4.7 below. 
The material in this section is somewhat technical, and the proof of Theorem 5.4.7 
is lengthier than anything we have previously encountered concerning integrals, but 
there does not appear to be any nice way of proving all the important properties of 
integrals without first going through the material in this section. 
We start with some preliminary definitions and lemmas. 


Definition 5.4.1. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
P and Q be partitions of [a,b]. The partition Q is a refinement of P if P C Q. A 


Example 5.4.2. The sets P = {0,5,1}, and Q = {0, 3,4, 7,1} and R = {0,4, 3,1} 
are partitions of (0, 1]. Then Q is a refinement of P, but R is not a refinement of P. © 


Lemma 5.4.3. Let [a,b] C R be a non-degenerate closed bounded interval, and let P 
and Q be partitions of |a,b}. 


1. PUQ is a partition of |a,b\, and PU Q is a refinement of each of P and Q. 
2. If Q is a refinement of P, then ||Q|| < ||P\\. 


Proof. Left to the reader in Exercise 5.4.1. 


The essence of the alternative characterization of integrability, to be given in 
Theorem 5.4.7, is that for a function to be integrable, its values cannot vary too much 
when restricted to sufficiently small subintervals of its domain. For example, the 
values of a continuous function (which we will see later by Theorem 5.4.11 must 
be integrable) will not vary much on a sufficiently small interval—that is what the 
€-6 definition of continuity says. On the other hand, the values of the function r 
given in Example 5.2.6 (3), which is not integrable, vary the same amount on any 
subinterval, no matter how small. We saw in Example 5.2.6 (2) (4) that it is possible 
for discontinuous functions to be integrable, so the obstacle to the integrability of 
the function r given in Example 5.2.6 (3) is not its discontinuity per se, but must be 
something else, and that something is precisely how the values of the function vary in 
small subintervals. 

The idea of measuring precisely how much the values of a function vary on an 
interval can be made precise (see [TBBO1, Section 6.7]), but we will not need this 
concept, and instead we proceed as follows. For a bounded function defined on a non- 
degenerate closed bounded interval, we will define the “upper sum” and “lower sum” 
of the function with respect to each partition of the interval, where these sums are 
similar to Riemann sums, but where the height of each rectangle represents, intuitively, 
the highest and lowest possible values respectively of the function on each subinterval 
of the partition. We then capture the idea of whether or not a function varies too 
much on sufficiently small subintervals by looking at the difference between the upper 
sum and lower sum when partitions with smaller and smaller norms are used. More 
specifically, Theorem 5.4.7 will say that a function is integrable if and only if the 
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difference between the upper sum and lower sum can be made as small as desired if 
we choose partitions with sufficiently small norm. 

Before we can state and prove Theorem 5.4.7, we need another definition and a 
lemma. Recall that by Theorem 5.3.3, it is no restriction in our study of integration 
that we consider only bounded functions for the rest of this section. 


Definition 5.4.4. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let P = {x0,+1,...,%n} be a partition of [a,b]. Suppose 
that f is bounded. 


1. For eachi€ {1,...,n}, let 
Mi(f) =lubf([x-1,xi]) and mi(f) = glb f([xi-1,.i)). 


If it is necessary to indicate the partition being used, we will write MP (f) and 


m? (f). 
2. The upper sum of f with respect to P, denoted U(f,P), is defined by 


n 
U(F,P) = ¥ Mi(f) (xi —xi-1), 
i=l 
and the lower sum of f with respect to P, denoted L(f,P), is defined by 
n 
L(f,P) = ) mi(f) (ai — 2-1). A 
i=l 


Observe in Definition 5.4.4 that each set of the form f([xi-1,x;]) is non-empty, 
and it is bounded because the function f is bounded, and therefore the Least Upper 
Bound Property and the Greatest Lower Bound imply that f([x;-1,x;]) has a least 
upper bound and a greatest lower bound. The function f in this definition need not 
be continuous, however, and hence the functions of the form f ioe need not have 
maximum values or minimum values (which is why we need to use least upper bounds 
and greatest lower bounds in the definition). Because the numbers M;(f) and m;(f) 
need not equal the value of f at any number in [x;—1,x;], we observe that upper sums 
and lower sums are not necessarily Riemann sums. 


Example 5.4.5. 


(1) Let f: [—1,1] — R be defined by f(x) = x? for all x € [—1,1]. Let P= 
{-1,—5,0,4,1}, which is a partition of [—1, 1]. Then 


1 [\? 1 iy? 1 1 5 
vnP)=(1-5+(-3) 3+ (5) 34g 


and 
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(2) Let g: [0,1] > R be defined by 


« _{% ifx=0 
ee 
e 0, ifxe (0,1). 


It was shown in Example 5.2.6 (2) that g is integrable. Let P = {xo,x1,...,%,} bea 
partition of [0,1]. Then U(f,P) = 7(x1 — xo) and L(f,P) = 
(3) Let r: [0,1] > R be defined by 


1, ifxeQn0,]1] 
r(x) = 
0, otherwise. 


It was shown in Example 5.2.6 (3) that r is not integrable. Let P = {x0,%1,..-,Xn} be 
a partition of [0,1]. Then U(r, P) = 1 and L(r,P) =0. © 


Although upper sums and lower sums are not necessarily Riemann sums them- 
selves, we see in Part (1) of the following lemma that every Riemann sum for a given 
partition is squeezed between the upper sum and lower sum for that partition. That 
fact is what makes upper sums and lower sums so useful. 


Lemma 5.4.6. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let P be a partition of |a,b]. Suppose that f is 
bounded. 


1. If T is a representative set of P, then L(f,P) < S(f,P,T) <U(f,P). 

2. If Ris a refinement of P, then L(f,P) <L(f,R) <U(f,R) < U(f,P). 

3. If Q is a partition of |a,b], then L(f,P) < U(f,Q). 
Proof. 

(1) Let T be a representative set of P. Suppose that P = {xo,x1,...,x,} and 
T = {t1,to,...,tn}. For eachi € {1,...,n}, we know that t; € [x;-1,x;], and therefore 


mi(f) = glb f([xi-1,xi]) < f(t) < lub f(fei-1,xi]) = Mi(/). 


Hence 


which means that L(f,P) < S(f,P,T) <U(f,P). 


(2) Let R be a refinement of P. Suppose that P = {xo,x1,...,x%,} and R = 
{¥0,)1)- be ibs where eee a Xn} Cc {¥0,)1; ane Yk} 

Let i € {1,...,n}. Then there are s,¢ € {1,...,k} such that x;_; = y,_, and 
xj = Yr. Hence [x;-1,xi] = [ys—1,ys] U [ys,¥sti1] U-+U Dr-1y]. If j € {s,...,t}, 
then f({yj-1,9;]) C f([i-1,xi]), and therefore by Exercise 2.6.1 (2) we see that 
m?(f) = glb f([xi-1,xi]) < glb f([yj-1,;]) = m* (f). It follows that 
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mi (f) (xi —xi-1) = mF (F)[Os —ys-1) + +0 —-1)] 


and hence 


i=1 j=l 


A similar argument shows that U(f,R) < U(f,P), and we omit the details. By 
Part (1) of this lemma we know that L(f,R) < U(f,R), and the proof is complete. 


(3) By Lemma 5.4.3 (1) we know that PU Q is a refinement of each of P and Q, 
and hence Part (2) of this lemma implies 


L(f,P) <L(f,PUQ) <U(f,PUQ) <U(f,Q). 


We are now ready to state and prove the major result in this section, Theo- 
rem 5.4.7, which gives an alternative characterization of what it means for a function 
to be integrable. There are actually two variants of this characterization given in the 
theorem, where the characterization given in Part (c) is the more useful, but we need 
to prove the other variant, given in Part (b), in order to prove Part (c). The proof of 
this theorem is somewhat lengthy, but the value of the theorem, which we will see 
later in this section and in Section 5.5, makes the long proof worth the effort. 

The definition of the integral, given in Definition 5.2.4, is relatively straightforward 
from an intuitive point of view, but it has one major disadvantage. In order to prove 
that a given function is integrable via that definition, it is necessary first to guess 
what the integral actually equals (that is the number K in Definition 5.2.4). In some 
situations, however, we might want prove that a function is integrable in principle 
even though we have no idea what the actual value of the integral is. For example, we 
will prove that all continuous functions on non-degenerate closed bounded intervals 
are integrable, even though we have no hope of finding a general formula for the 
integrals of all continuous functions. The beauty of Theorem 5.4.7 is that it gives a 
characterization of integrability in terms of upper sums and lower sums getting closer 
and closer to each other, without having to specify what number (which would be the 
value of the integral) the upper sums and lower sums are getting closer to. 


Theorem 5.4.7. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is bounded. The following are equiva- 
lent. 


a. The function f is integrable. 

b. For each € > 0, there is some 5 > 0 such that if P is a partition of |a,b| with 
||P|| < 6, then U(f,P) —L(f,P) <e€. 

c. For each € > 0, there is some partition P of |a,b] such that U(f,P)—L(f,P) < 
E. 
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Proof. 


(a) > (b) Suppose that f is integrable. 
Let € > 0. By the definition of integrability, there is some 6 > 0 such that if 
R is a partition of [a,b] with ||R|| < 6, and if T is a representative set of R, then 


ISCFR,T) — Ja F(a) ax| < §. 

Let P = {zo,z1,..-,Z¢} be a partition of [a,b] with ||P|| <6. Letic {1,...,k}. 
By the definition of M;(f) and m;(f), together with Lemma 2.6.5, there are c;,dj € 
[zi-1, zi] Such that 


Hence 


lmi(f) —F(di)|< ze, and |Mi(f)— Fle) < ze 


Let C = {c1,c2,...,c¢} and D = {d),d2,...,dx}. Then C and D are representative 
sets of P. 
Using Exercise 2.5.3, we see that 


k 


k 
|U(F,P) —S(F,P,C)| =| Mi(f) zi — 2-1) — YY F(a) (i - 1) 


A similar argument shows that 
ISU,P.D) —LUF,P)| < 5. 
Therefore 
U(F,P)-L,P) = IU.) — LF P) 
=|urp)sinc)+s(r.Re)- f"ra)as+ f° foyar 


=S(F,P)D) +87, 2,D) LP) 


<|U(f,P) —S(f,P,C)|+ sir.nc)- [ ” Fx) dx 


[ tesyax—s(p,P.0)] +\S(P,7.D)-L.?) 


ee ee es 
4°4°4°4— 


E. 


(b) = (a) Suppose that for each € > 0, there is some 6 > 0 such that if P is a 
partition of [a,b] with ||P|| < 6, then U(f,P) —L(f,P) <e. 
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Let 
u = {U(f,P) | Pisa partition of [a,b]}, 


and 
L£={L(f,P) | P isa partition of [a,b] }. 


Then U and £ are non-empty subsets of R, because there exist partitions of [a,b], 
for example P = {a,b}. By Lemma 5.4.6 (3), we know that if L(f,Q) € £ and 
U(f,R) € U, then L(f,Q) < U(f,R). Let p > 0. By hypothesis, there is some B > 0 
such that if P is a partition of [a,b] with ||P|| < B, then U(f,P) —L(f,P) < w. 
Let P be a partition of [a,b] with ||P|| < B, which exists by Exercise 5.2.1. Then 
U(f,P) —L(f,P) < pW. We now see that £ and 7 satisfy the hypotheses of both parts 
of the No Gap Lemma (Lemma 2.6.6), and therefore £ has a least upper bound and U 
has a greatest lower bound, and lub £ = glb U. 

Let K = lub £ = glb U. Let € > 0. By hypothesis, there is some 6 > 0 such that if 
P is a partition of [a,b] with ||P|| < 6, then U(f,P) —L(f,P) <e. 

Now let W be a partition of [a,b] with ||W|| < 6, and let T be a representa- 
tive set of W. Then U(f,W) —L(f,W) < €. By Lemma 5.4.6 (1) we know that 
L(f,W) <S(f,W,T) <U(f,W). We also know that K = lub £ = glb U, and hence 
L(f,W) < K <U(f,W). It follows that |S(f,W,7) —K| < U(f,W) —L(f,W). Be- 
cause U(f,W) —L(f,W) < €, it follows that |S(f,W,7T) — K| < €. Hence f is inte- 
grable, with the integral of f equal to K. 


(b) = (c) This implication is trivial because of Exercise 5.2.1. 


(c) = (b) Suppose that for each € > 0, there is some partition P of [a,b] such 
that U(f,P) —L(f,P) < €. 

Let ¢ > 0. By hypothesis, there is some partition Q = {xo,x1,-...,Xn} of [a,b] such 
that U(f,Q) —L(f,Q) < 5. Because f is bounded, there is some B € R such that 
| f (x)| < B for all x € [a,b]. We may assume that B > 0. Let 6 = z.. Then 6 > 0. 

Let Z = {zo,z1,-.-,zx} be a partition of [a,b] with ||Z|| < 6. Let 


W = {re {1,...,k} | x; € (@_-1,2;)) for some j € {1,...,}}. 
For each j € {1,...,n}, let 
Vj= {ie {1,..-,4} | fei, zi] © [x j-1,)] 


Then W UV, U---UV, = {1,...,k}, and the sets W,V,...,V,, are pairwise disjoint. 
Because xp = a and x, = b, it follows that W has at most n — 1 elements. 
For each i € {1,...,k}, it follows from Exercise 5.4.9 (4) that MZ(f) —m?Z(f) < 


2B. For each j € {1,...,n}, observe that Liev, (zi—2i-1) <xj—xj-1, and that if i € V; 


then M7(f) —m?(f) <M2(f) —m2(f). 
Then 
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& 5, 

< Y 2B —2-1) + E(MA(f) — m2) E(w 2-1) 
& s &, 

<(n- 1/288 + Y IMPLI) — m2 (f)\(aj —x}-1) 

= (n—1)2B— + (UF) -LF,) <5 +5 =e. 


It would have been nicer to have proved Theorem 5.4.7 by proving (a) = (b) > 
(c) => (a), but unfortunately there does not appear to be a direct way of going from (c) 
to (a). 

The proof of (b) = (a) in Theorem 5.4.7 suggests yet another characterization of 
integrability, based upon the following definition. 


Definition 5.4.8. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f: [a,b] — R be a function. Suppose that f is bounded. The upper integral of /, 


denoted fc f(x) dx, is defined by 
“pb 
| f(x)dx = glb{U(f,P) | P is a partition of [a,b]}, 
a 
and the lower integral of f, denoted Hh f (x) dx, is defined by 


[ir dx = lub{L(f,P) | P is a partition of [a,b}}. A 


Observe that fc f(x)dx = glb Uand f . f(x) dx = lub £ in the notation of the proof 
of (b) > (a) in Theorem 5.4.7. The following lemma is derived immediately from 
the arguments used in that proof, though using only Part (1) of the No Gap Lemma 
(Lemma 2.6.6); we omit the details. 


Lemma 5.4.9. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. Suppose that f is bounded. Then the upper integral and 


lower integral of f always exist, and f?F (x)dx < {2 dx. 
We are now ready to state our additional characterization of integrability. 


Theorem 5.4.10. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f: [a,b] > R be a function. Suppose that f is bounded. Then f is integrable 


if and only if Jr f(x) dx = Li@ dx, and if this equality holds then ipmies dx = 
Cie dx = F(x) dx. 
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Proof. Left to the reader in Exercise 5.4.13. 


Some real analysis texts define the integrability of a function f by saying that f 


is integrable if i f(x)dx = < f(x) dx, and then prove as a theorem what we take as 
the definition of integrability. The advantage of that alternative approach is that the 
proofs of some theorems about integrals can be reached rapidly, though at the price 
of a definition of integrability that is less intuitively meaningful, and less familiar to 
students who have taken calculus. 

We conclude this section with the following important result, which is a nice 
application of Theorem 5.4.7. In contrast to differentiable functions, which are very 
well behaved (for example, we saw in Theorem 4.2.4 that all differentiable functions 
are continuous), integrable functions can be rather strange (for example, we saw in 
Example 5.2.6 (2) that integrable functions need not be continuous). However, as we 
now see, all continuous functions on non-degenerate closed bounded intervals are 
integrable. The reason we need to use Theorem 5.4.7 in the proof of the following 
theorem is that for an arbitrary continuous function there is no way to guess the value 
of the integral, and so we have no candidate for the number K in the original definition 
of integrals. 


Theorem 5.4.11. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f : [a,b] = R be a function. If f is continuous, then f is integrable. 


Proof. Suppose that f is continuous. Let € > 0. By Theorem 3.4.4 we see that f is 
uniformly continuous. Hence there is some 6 > 0 such that x,y € [a,b] and |x—y| < 6 
imply | f(x) — f(y)| < 5&3. 

Let P = {x0,x1,...,Xn} be a partition of [a,b] with ||P|| < 6. Leticé {1,...,n}. By 
Exercise 3.3.2 (2) the function f lix;_ |x] 1S continuous. The Extreme Value Theorem 
(Theorem 3.5.1) applied to f|,,,_,.,) implies that there are Bite © eae such 


max?~*min 
that f(x‘,,,) < f(x) < fing) for all x € [xi-1,x;]. By Exercise 2.6.2 we see that 
Mi(f) = f (X4nax) and m;(f) = f(xi,;,). Because ||P|| < 5 we know that |x;—2x;-1| < 6, 
and hence |x},4. —Xpin| < 6. Therefore | f(x/,ax) — f (Xnin)| < g&>, and it follows that 
Mi(f) —mi(f) < jee Then 


U(fP) —LUF.P) = SIMA f) —mif)] (31 — 1-1) 


— Yr) = = (b-4) = €. 


By Theorem 5.4.7 (b) we conclude that f is integrable. 


It follows from Theorem 5.4.11 that many familiar functions, such as polynomials, 
exponentials, logarithms, sine and cosine, are all integrable on any non-degenerate 
closed bounded interval. A more detailed discussion of which functions are integrable 
is given in Section 5.8. 

As with the other main theorems of real analysis, Theorem 5.4.11 ultimately relies 
upon the Least Upper Bound Property. Observe that this theorem would not be true 


256 5 Integration 


for continuous functions of the form f: [a,b] 1Q — R. For example, the function 
f: [0,2] AQ — Q be defined by f(x) = a5 for all x € [0,2] MQ is continuous but 
not bounded, and hence not integrable. The reader is encouraged to locate exactly 
where the Least Upper Bound Property is used in our proof of Theorem 5.4.11 (it is 
used in more than one place). 


Reflections 


The transition from the previous section to this one is akin to the transition from a 
pleasant stroll to a hike up a steep hill; whereas the previous section had relatively 
straightforward theorems and proofs about integrals, this section has technical material 
the use of which will be apparent only in the subsequent section, and has a lengthier 
proof than anything we have previously encountered concerning integrals. In contrast 
to derivatives, where the proofs of the basic properties are not particularly difficult, 
there appears to be no way to prove some of the basic (and intuitive) properties of 
integrals without first going through some tricky technicalities. In a calculus course, 
where the properties of integrals are not given rigorous proofs, this tricky material 
can be glossed over, but not here. The nature of mathematics is such that, if it is done 
properly, we are required to be satisfied, at times, with delayed gratification. 


Exercises 


Exercise 5.4.1. [Used in Lemma 5.4.3.] Prove Lemma 5.4.3. 


Exercise 5.4.2. Find the upper sum and lower sum for each of the following functions 
with respect to the given partition. 


(1) Let f: [1,3] — R be defined by f(x) = : for all x € [1,3], and let P = 
{1,1.4,1.8,2.2,2.6,3}. 

(2) Let s be the function given in Example 5.2.6 (4), and let Q = {0, 4,2, 2, 3,1}. 
Exercise 5.4.3. [Used in Theorem 5.5.6 and Theorem 5.5.7.] Let [a,b] C R be a non- 
degenerate closed bounded interval, let c € (a,b), let f: [a,b] — R be a function, let 
P be a partition of [a,c] and let Q be a partition of [c,b]. Then PU@Q is a partition of 
[a,b]. Prove that 


U(f,PUQ) = OF gas?) +U(f\ic,p], 2) 


and 
Exercise 5.4.4. 


(1) Let [a,b] C R be a non-degenerate closed bounded interval, let f: [a,b] > R 
be a function and let P be a partition of [a,b]. Suppose that f is continuous. 
Prove that there are representative sets S and T of P such that S(f,P,S) = 
L(f,P) and S(f,P,T) =U(f,P). 
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(2) Find an example of a bounded function g: [—1,1] — R, and a partition Q of 
(—1, 1], such that U(g, Q) is not equal to any Riemann sum of g with respect 
to Q. 
Exercise 5.4.5. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let P be a partition of [a,b]. Suppose that there are 
m,M €R such that m < f(x) < M for all x € [a,b]. Prove that m(b—a) < L(f,P) < 
U(f,P) <M(b—a). 


Exercise 5.4.6. [Used in Theorem 5.8.5.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let f,g: [a,b] — R be functions and let P be a partition of [a,b]. 
Suppose that f and g are bounded, and that f(x) < g(x) for all x € [a,b]. Prove that 
L(f,P) <L(g,P) and U(f,P) < U(g,P). 

Exercise 5.4.7. [Used in Exercise 5.4.8.] Let [a,b] C R be a non-degenerate closed 


bounded interval, let f: [a,b] + R be a function and let P be a partition of [a,b]. 
Suppose that f is bounded. Prove that 


U(f,P) = lub{S(f,P,T) | T is a representative set of P}, 


and 
L(f,P) = glb{S(f,P,T) | T is a representative set of P}. 


[Use Exercise 2.6.6. ] 


Exercise 5.4.8. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Prove that f is integrable if and only if for each € > 0, 
there is some 6 > 0 such that if P and Q are partitions of [a,b] with ||P|| < 6 and 
||OQ|| < 6, and if T is a representative set of P, and V is a representative set of Q, then 
IS(f,P,T) —S(f,Q,V)| <e. [Use Exercise 2.6.10, Exercise 5.3.5 and Exercise 5.4.7.] 


Exercise 5.4.9. [Used in Theorem 5.4.7, Theorem 5.5.1, Theorem 5.8.5 and Lem- 
ma 6.4.9.] Let [a,b] C R be a non-degenerate closed bounded interval, let f: [a,b] > 
R be a function and let P = {xo,x1,...,%n} be a partition of [a,b]. Suppose that f is 
bounded. Let i € {1,...,m}. 


(1) Let y,z € [ai-1,xi)- Prove that |() — f(2)| < Mi(f) —mi(f). 
(2) Prove that 


Mi(f) —mi(f) = lub{| f(y) — F(@)| | yz € bax t- 
(3) Suppose that there is some P € R such that | f(x) — f(y)| < P for all x,y € 
[x;-1,x;]. Prove that M;(f) —mj(f) < P. 
(4) Suppose that there is some M € R such that | f(x)| < M for all x € [xj_1,xi]. 
Prove that M;(f) —m,(f) < 2M. 


Exercise 5.4.10. [Used in Exercise 5.4.11 and Theorem 5.5.7.] Let [a,b] C IR be a non- 
degenerate closed bounded interval, let f: [a,b] — R be a function and let P be a 
partition of [a,b]. Suppose that f integrable. Prove that 


b 
L(f,P) < | F(x)dx <U(f,P). 
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Exercise 5.4.11. [Used in Theorem 5.8.5.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f,g: [a,b] — R be functions. Suppose that f integrable, 
that g is bounded, and that for each € > 0, there is a partition P of [a,b] such that 
U(f,P)-Lf,P) <€ and L(f,P) < L(g,P) <U(g,P) < U(f,P). 


(1) Prove that g is integrable. 
(2) Prove that [? g(x) dx = [? f(x) dx. [Use Exercise 5.4.10.] 


Exercise 5.4.12. [Used in Exercise 5.5.3 and Theorem 9.3.6.] Let [a,b] C R be a non- 
degenerate closed bounded interval, and let f: [a,b] — R be a function. Prove that if 
f is monotone, then /f is integrable. 


Exercise 5.4.13. [Used in Theorem 5.4.10.] Prove Theorem 5.4.10. You may cite parts 
of the proof of Theorem 5.4.7 without repeating them. 


Exercise 5.4.14. Find the upper integral and lower integral for each of the following 
functions. 


(1) Let g be the function given in Example 5.2.6 (2). 
(2) Let r be the function given in Example 5.2.6 (3). 


Exercise 5.4.15. Let [a,b] C R be a non-degenerate closed bounded interval, let 
k € Rand let f: [a,b] > R be a function. Suppose that f is bounded. 


(1) Prove that if k > 0, then f?[kf](x) dx = k[? f(x) dx. 


(2) Prove that if k <0, then [?[kf](x) dx =k? f(x) dx. 
_ [Use Exercise 3.2.15.] 


Exercise 5.4.16. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f,g: [a,b] > R be functions. Suppose that f and g are bounded. 


(1) Prove that SP Uf + g](x) dx < 2 f(x) dx + f?9(x) dx, and give an example 
where the inequality is strict. 
(2) Prove that Slr + g|(x)dx > f?F(x)dx + f?9(x) dx, and give an example 
where the inequality is strict. _ 


[Use Exercise 2.6.9 and Exercise 3.2.16.] 
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We now discuss some additional properties of the Riemann integral, the proofs of 
which rely upon Theorem 5.4.7. The first property involves the integrability of the 
composition of functions. Recall from Example 5.2.6 (5) that the composition of 
integrable functions need not be integrable. The following theorem, which will be 
useful to us shortly, uses uniform continuity to circumvent the strange behavior seen 
in that example. 

In this theorem, we have functions f: [a,b] > R and g: D— R, where f([a,b]) C 
D, and we wish to form the composition go f. Because the codomain of f is not 
equal to the domain of g, then technically we would first need to change f into a 
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function [a,b] — D before forming the composition, but to avoid cumbersome writing 
we abuse notation and simply write go f, which should cause no confusion. 


Theorem 5.5.1. Let [a,b] C R be a non-degenerate closed bounded interval, let 
DCR be asetand let f : |a,b] > R and g: D > R be functions. Suppose that f is 
integrable, and that f([a,b]) C D 


1. If g is uniformly continuous and bounded, then go f is integrable. 
2. If D is a non-degenerate closed bounded interval and g is continuous, then 
go f is integrable. 


Proof. 


(1) Suppose that g is uniformly continuous and bounded. We will show that go f 
is integrable by showing that it satisfies the criterion given in Theorem 5.4.7 (c). 

Let € > 0. Because g is bounded, there is some N € R such that |g(x)| < N 
for all x € D. Observe that N > 0. Let n = TBAT aN) Then 7 > 0. Because g is 
uniformly continuous, there is some 6 > 0 such that x,y € D and |x —y| < 6 imply 
|g(x) — g(y)| < 7. By taking a smaller value of 6 if necessary, we may suppose that 
6 < 7. Because f is integrable, we know by Theorem 5.4.7 (c) that there is some 
partition P = {xo,x,,...,x,} of [a,b] such that U(f,P) — L(f,P) < 8?. 

Let 

W = {ie {1,...,n} | Mi(f) —mi(f) < 6} 


and 


V ={ie {1,...,n} | Mi(f) —mi(f) > 5}. 


Then WUV = {1,...,2} andWOV =9. 
Let j € W. If y,z € [x;-1,x;], then by Exercise 2.6.10 (2) we see that | f(y) — 


F(z)| < Mj(f) —mj(f) < 4, and therefore |(g 0 f)(y) — (g 0 f)(z)| = Ig(f(y)) — 
g(f(z))| < 7. By Exercise 5.4.9 (3) we deduce that M;(go f) —mj(go f) <n. Then 


Yi [Mi(g of) —mi(go f)|\@i—x-1) <n YS i —¥-1) <n (b-a). 


icW icW 


Let k € V. Then M;(f) —my(f) > 6, and so Mem Me ™(D > 1, Because |g(x)| <N 
for all x € D, we see that if s,t € D, then |g(s) — g(t)| < 2 |g(s)|+|g(t)| < 2N. It follows 


that if y,z € [xx—1,e], then |(go f)(y) — (go f)(z)| =|g(F()) —8(F(z))| S 2N. Hence, 
using Exercise 5.4.9 (3) again, we deduce that M;(g 0 f) —m,(go f) < 2N. Then 


YVIMi(g of) —mi(go f)\ (xi —xi-1) < 2NY (xi — x11) 


ieV icV 
cay MU m6, 
icV 
< FE IM) —m( Alix) 
_ 2N 


= ltrP)- L(f,P)] < “6? = 2N6 <2Nn. 
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Putting the above calculations together we see that 


n 


U(goF,P)—Lgof,P)= YIMi(ges)— mj(g of) (x; —xi-1) 
= Yi IMi(gof) —mi(gef)\(ai—¥i-1) 


icW 


+ [Mi(go f) —mi(go f)](xi—xi-1) 


ic€V 


<n(b—a)+2Nn = (b—a+2N) = <€. 


Hence go f satisfies the criterion given in Theorem 5.4.7 (c). 


(2) This part of the theorem follows immediately from Part (1) of this theorem 
together with Theorem 3.4.4 and Corollary 3.4.6. 


We now turn to the integrability of the product and quotient of integrable functions. 
There is no problem with products of integrable functions, as we will see below, but 
the situation is slightly trickier for quotients. We saw in Theorem 3.3.5 (5) that the 
quotient of continuous functions is continuous as long as the denominator is not 
zero, and we saw the analogous result for differentiability in Theorem 4.3.1 (5) (the 
Quotient Rule). Unfortunately, as we see in the following example, just knowing that 
the denominator is not zero is not sufficient to guarantee that the quotient of integrable 
functions is integrable. 


Example 5.5.2. Let f,g: [0,1] — R be defined by f(x) = 1 for all x € [0, 1], and 


1, ifx=0 
g(x) = 
x, ifx€ (0,1). 


Bice haan 


We know by Example 5.2.6 (1) that f is integrable. The function g is also integrable, 


as can be seen by combining Exercise 5.2.6 and Exercise 5.3.3 (3). However, even 


though g(x) 4 0 for all x € [0, 1], the function - is not integrable, because integrable 


Then 


functions are bounded by Theorem 5.3.3, and yet f is not bounded, a fact that is 


evident by looking at the graph of f, and is proved in Example 3.2.6. 0) 
The following definition allows us to avoid the problem seen in Example 5.5.2. 


Definition 5.5.3. Let A C R be a set, and let f: A — R be a function. The function 
f is bounded away from zero if there is some P > 0 such that |f(x)| > P for all 
xXE€A. A 


If a function f: A — R is bounded away from zero, then clearly f(x) 40 for all 
x EA, 


5.5 Further Properties of the Riemann Integral 261 


Theorem 5.5.4. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f,g: [a,b] = R be functions. Suppose that f and g are integrable. 


1. f” is integrable for alln EN. 
2. fg is integrable. 
3. If g is bounded away from zero, then E is integrable. 


Proof. Because f and g are integrable, then by Theorem 5.3.3 we know that f and g 
are bounded. Hence there are M, ,M2 € R such that | f(x)| <M, and |g(x)| < M2 for 
all x € [a,b]. We may suppose that M; > 0 and M2 > 0. Let MW = max{M1, M2}. Then 
F({a,b]) S [-M,M] and g({a,b]) C [—M,M]. 


(1) Letn EN. Let A: [—M,M] — R be defined by h(x) = x" for all x € [—M,M]. 
Then / is continuous by Example 3.3.7 (1). It now follows from Theorem 5.5.1 (2) 
that f” = ho f is integrable. 


(2) Because f and g are integrable, then f + g is integrable by Theorem 5.3.1 (1). 
By Part (1) of this theorem we know that f*, and g* and (f +g)? are all integrable. 
Observe that 


1 
fea te) Fe 
It follows from Theorem 5.3.1 (2) (3) that fg is integrable. 


(3) Suppose that g is bounded away from zero. Hence there is some P > 0 
such that |g(x)| > P for all x € [a,b]. It follows that g([a,b]) C [—M,—P]U[M, P]. Let 
k: [-M,—P|U [M, P] > R be defined by k(x) = + for all x € [-M, —P]U[M, P]. Then 
k is continuous by Example 3.3.3 (2). It follows from Exercise 3.4.6 that k is uniformly 
continuous and bounded. Using Theorem 5.5.1 (1) we see that : =ko gis integrable. 


We then use Part (2) of this theorem to deduce that f =f: : is integrable. 


Our next result concerns the integrability of the absolute value of a function. 


Theorem 5.5.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. If f is integrable, then | f| is integrable and 


[read <P reolae 


Proof. Suppose that f is integrable. By Theorem 5.3.3 we know that f is bounded. 
Hence there is some M € R such that |f(x)| <M. We may assume that M > 0. 
Hence f([a,b]) C [—M,M]. Let h: [—-M,M] — R be defined by h(x) = |x| for all 
x € [—M,M]. We know by Exercise 3.3.1 (2) that A is continuous. It follows from 
Theorem 5.5.1 (2) that |f| = ho f is integrable. 

Observe that —|f(x)| < f(x) < |f(x)| for all x € [a,b]. It now follows from Theo- 
rem 5.3.1 (3) and Theorem 5.3.2 (2) that 


—Pirlars [peyars [yeplae 
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Because | f(x)| > 0 for all x € [a,b], then Theorem 5.3.2 (1) implies that f° | f (x)|dx > 
0. Hence 


b b 
[ro ax < [rela 

Theorem 5.5.5 can be viewed as an analog for integrals of the Triangle Inequality 
(Lemma 2.3.9 (6)) and its extension to finite sums in Exercise 2.5.3, where instead 
of the sum of finitely many numbers we have an integral, which for the sake of the 
analogy can be thought of intuitively as the infinite sum of all the values of the function 
f (though it is not really such a sum). As with the Triangle Inequality, we cannot 
in general replace the inequality in Theorem 5.5.5 with equality because if f takes 
on both positive and negative values, then there will be cancellation in ie f (x) dx, 
whereas there will be no cancellation in ie | f (x)| dx. 

We now turn to the restriction of an integrable function to a subinterval of its 
domain. It might seem obvious that such a restriction of an integrable function is 
integrable, but the proof is non-trivial, making use of Theorem 5.4.7. What makes the 
proof less straightforward than might be expected is the fact that if one starts with 
a partition P of an interval [a,b] and a representative set T of P, and if [c,d] C [a,b], 
the numbers c and d might not be in P, and the numbers in T might not be in [c,d]. 


Theorem 5.5.6. Let DC C CR be non-degenerate closed bounded intervals, and let 
f: CR bea function. If f is integrable, then f |p is integrable. 


Proof. Suppose that f is integrable. Let C = [a,b] and D = [c,d]. If c=aandd=b 
then there is nothing to prove, so suppose that at least one of these equalities is false. 
Without loss of generality, assume that a < c. Because c < d then a < d. We will first 
show that | {a,d] 18 integrable. If d = b then there is nothing to prove, so we suppose 
that d < b. Let € > 0. By Theorem 5.4.7 (c) there is some partition P of [a,b] such 
that U(f,P) —L(f,P) < €. Let Q = PU {d}, let R= QM [a,d] and let Z = QN[d,D}. 
Then Q is a partition of [a,b] that is a refinement of P, and R is a partition of [a,d], 
and Z is a partition of [d,b], and Q = RUZ. By Exercise 5.4.3 we know that 


U(f,Q) =U(f|jaayR) + U(fia4,Z) 
and 
L(f,Q) = Lf jaa R) + Lf laej,Z)- 

Lemma 5.4.6 (2) implies that L(f,P) < L(f,Q) < U(f,Q) < U(f,P). Hence 
U(f,Q) —L(f,Q) < €. By Lemma 5.4.6 (1) we see that U(f|ja.4),2) —L(fljaoj,Z) = 
0. Then 

U(F |{a,a)-®) — Lf faa}, R) 

< [UF jaa, R) — LF \fa,qj.®)] + [OP lfa.9),Z) — LA |[a,5),Z)] 

= U(f,Q)-L(f,Q) <e. 
Therefore f [laa satisfies the criterion in Theorem 5.4.7 (c), and hence f I[a,a} is 
integrable. 


A similar argument shows that f|{c,q) = (f|[a.a\)|[c,a] 18 integrable, and we omit 
the details. 
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Our next theorem is illustrated in Figure 5.5.1. The intuitive idea is that if we 
want to integrate a function on an interval [a,b], we can break up the interval into two 
subintervals [a,c] and |c,b], integrate on each of the two subintervals and add these 
two integrals. The proof, once again, is not as simple as might be expected. 


Fig. 5.5.1. 


Theorem 5.5.7. Let [a,b] C R be a non-degenerate closed bounded interval, let 
c € (a,b) and let f: |a,b| > R be a function. 


1. f is integrable if and only if f\{q,¢ and f\\c,»| are integrable. 
2. If f is integrable, then 


[reoar= [reject  pa)ate 


Proof. 


(1) Suppose that f is integrable. It then follows immediately from Theorem 5.5.6 
that f|jq,-) and f|j.,) are integrable. 

Now suppose that f fee and f l{c,b] are integrable. Let € > 0. By applying the 
criterion in Theorem 5.4.7 (c) to each of f lla,c] and f I[c,b]> we know that there is a 
partition P; of [a,c] such that U(f\ja,q,P1) —L(f |ja.cjsPi) < 5 and a partition P) of 
[c,b] such that U(f|je.9j,P2) —Lf |jcp},P2) < 5. Let P = P, UP). Then P isa partition 
of [a,b]. Using Exercise 5.4.3 we see that 


U(f,P) -LF,P) = [Uf jag Pt) +U Alco) P2)] — LEAF jag Pi) + LA I 0,5), 2)] 
CF iaak) —L(f tae, P1)] of [U (Fie. P2) —L(Fliepj,2)] 
€ 


=€. 
2 2 


Therefore f satisfies the criterion in Theorem 5.4.7 (c), and hence f is integrable. 


(2) Suppose that f is integrable. Let € > 0. Let P;, P) and P be as in the 


proof of Part (1) of this theorem. Recall that U(f|ja,¢q,P1) —L(f|{aqsP1) < 5 and 
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U(F|[ca).P2) — LS lleg»Pe) < 5. Therefore U(f|jaj,PF1) < L(fljae,Pi) + § and 


U(F \ic,5,P2) < LS |jcj,P2) + 5: 
By Exercise 5.4.10 we know that 


b 
LP) < f fa)dx <UU,P), 


and similarly for f|jq..j and f|{-.,)- Using Exercise 5.4.3 we see that 


b 
[ f@)ar<U(F,P) =U Paes) +U Flies Pa) 


E E 
EF | mesh) + 5) +L(f \[cp),P2) + 2 


< [pact [ soyarte. 


A similar argument starting with (? #@) dx > L(f,P) shows that 


[reare [rears [reyar—e 


Hence 
Lf renare [separ -e< [pear < [[rerars [reyas| ae, 


which is equivalent to 


[rear [rears [sear 


Because € was arbitrarily chosen, it now follows from Lemma 2.3.10 (3) that 


[sear [[renare [reyes —0. 


In Theorem 5.5.7 it is assumed that c is between a and b, but in practice it is 
sometimes useful to allow c to be outside [a,b] as well; we will see such a need in 
the proof of the Fundamental Theorem of Calculus Version I (Theorem 5.6.2). To be 
able to handle that situation, we first need to consider integrals of the form i f(x) dx, 
where q is not greater than p. The following definition is precisely what is needed to 
extend Theorem 5.5.7 to the case where c is not necessarily between a and b. 


<€E. 


Definition 5.5.8. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] + Rbe a function. Suppose that f is integrable. Let //" f(x) dx be defined by 


[reax=- | ” FQ) ax, 
and let [ f(x) dx be defined by 


[ feiao. A 


a 
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An intuitive way to think of the definition of : f(x) dx when b < a versus when 
a < bis by thinking of an integral as giving a positive value if, when you travel on 
the x-axis from a to b, the function is to your left. 


Corollary 5.5.9. Let C C R be a closed bounded interval, and let f: C = R be a 
function. Let a,b,c € C. If f is integrable, then 


[rear= [rears [re ax 


Proof. Left to the reader in Exercise 5.5.2. 


Reflections 


On first encounter, the properties of the Riemann integral presented in this section 
do not appear, from their statements, to be of a different nature than the properties 
presented in Section 5.3; the properties in both sections seem intuitively reasonable, 
and are clearly useful. The sole reason for dividing the properties of the Riemann 
integral into these two sections is that the proofs of the properties in this section 
require the technicalities about upper sums and lower sums proved in Section 5.4, 
specifically Theorem 5.4.7, whereas the properties in Section 5.3 can be proved using 
only the definition of the Riemann integral. What is disturbing intuitively is that it 
was not obvious ahead of time which properties of the Riemann integral would turn 
out to have straightforward proofs and which would rely upon a technical result such 
as Theorem 5.4.7. When trying to prove a new theorem, it is always worth trying the 
most straightforward approach first, but, unfortunately, it is only with hindsight that 
one can know if such an approach will work. 


Exercises 


Exercise 5.5.1. Use Theorem 5.5.1 (2) to give an alternative proof of Theorem 5.4.11. 
Make sure you do not indirectly use Theorem 5.4.11 in your proof. 
[Use Exercise 5.2.6.] 


Exercise 5.5.2. [Used in Corollary 5.5.9.] Prove Corollary 5.5.9. 


Exercise 5.5.3. [Used in Exercise 7.2.3.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Suppose that f is decreasing. 
Exercise 5.4.12 implies that f is integrable. Prove that if f is strictly decreasing, then 


f(b)(b—a) < J f(x)dx < f(a)(b—a). 
Exercise 5.5.4. Let a € (0,00), and let f: [—a,a] — R be a function. Suppose that f 
is integrable. 


(1) Suppose that f(—x) = f(x) for all x € [—a,a]; such a function is called an 
even function. Prove that [“, f(x) dx =2 Jy f(x) dx. 
(2) Suppose that f(—x) = —f(x) for all x € [—a,a]; such a function is called an 
odd function. Prove that [“, f(x) dx = 0. 
[Use Exercise 5.2.10.] 
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Exercise 5.5.5. [Used in Theorem 5.9.17.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Suppose that f is bounded. 


(1) Suppose that f is continuous except possibly at a single point in [a,b]. Prove 
that f is integrable. 

(2) Suppose that f is continuous except possibly at finitely many points in [a, b]. 
Prove that f is integrable. 


Exercise 5.5.6. [Used in Section 6.4 and Lemma 6.4.9.] Let [a,b] C R be a non-degen- 
erate closed bounded interval, and let f: [a,b] + R be a function. Suppose that f is 
integrable. Theorem 5.5.6 implies that fj, ;| is integrable for each t € (a,b). Prove 


that lim fi f(x)dx =f? f(x) dx. [Use Exercise 5.3.2.] 
t—-b— 


Exercise 5.5.7. [Used in Section 5.8, Theorem 7.4.5 and Exercise 7.4.3.] Let [a,b] CR 
be a non-degenerate closed bounded interval, and let f: [a,b] — R be a function. 
Suppose that f(x) > 0 for all x € [a,b], and that f is continuous. Prove that if 
2 f(x)dx = 0, then f(x) = 0 for all x € [a,b]. Example 5.2.6 (2) shows that the 
requirement of continuity cannot be dropped. [Use Exercise 5.3.4.] 


Exercise 5.5.8. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is continuous. Prove that if [* f(x) dx =0 
for all c € [a,b], then f(x) = 0 for all x € [a,D]. 


Exercise 5.5.9. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f,g: |a,b] — R be functions. Suppose that f and g are integrable, and that at least 
one of Hh f? (x) dx or i? g(x) dx is not zero. Prove that 


foal «(ff re) ([e0e). 


(The integrals in this inequality all exist by Theorem 5.5.4.) This result is known 
as the Cauchy—Schwarz Inequality for Integrals. The hypothesis that at least one of 
f Ks f?(x) dx or f¢ g°(x) dx is not zero is not actually necessary to prove this inequality, 
but it makes the problem simpler. 

To prove the inequality, without loss of generality assume that ihe g?(x) dx £0, and 
observe that (f — cg)? is integrable for any c € R by Theorem 5.3.1 and Theorem 5.5.4. 
At some point in the proof choose a useful value of c. 


Exercise 5.5.10. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f,g: [a,b] — R be functions. Suppose that f is continuous, that g is integrable 
and that g(x) > 0 for all x € [a,b]. Theorem 5.4.11 implies that f is integrable, and 
Theorem 5.5.4 (2) then implies that fg is integrable. 


(1) By the Extreme Value Theorem (Theorem 3.5.1) there are Xmin,Xmax € [a,5] 
such that f (Xmin) < f(x) < f(Xmax) for all x € [a,b]. Prove that 


Fenn) [aoaes [Ufsl(e)ax< Fm) [tea 
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(2) Prove that there is some c € |a,b] such that Sl fgl (x) dx = f(c) f? g(x) dx. 
This result is known as the Generalized Mean Value Theorem for Integrals. 


Exercise 5.5.11. This exercise makes use of Exercise 5.2.11. Let [a,b] C R be a 
non-degenerate closed bounded interval, and let f,g: [a,b] + R be functions. 


(1) Suppose that f is integrable, and that g is increasing and continuously differ- 
entiable. Prove that f is Riemann-Stieltjes integrable with respect to g, and 
that fi? f(x)dg = J? f(x)g'(x) dx. 

(2) Find an example to show that if the requirement that g is continuously dif- 
ferentiable in Part (1) of this exercise is changed to differentiable, it will 
not necessarily be the case that fg’ is integrable, and therefore it will not be 
possible to express Lf? dg in terms of is f (x)g! (x) dx. 


5.6 Fundamental Theorem of Calculus 


The Fundamental Theorem of Calculus does two things at once: It shows that there 
is a relation between derivatives and integrals (by definition these two concepts are 
quite distinct—trecall that by the term “integral” we mean “definite integral’), and it 
gives us a method for calculating integrals (doing so directly from the definition of 
integrals is very difficult in all but the simplest cases). If calculus is to mathematics 
as Shakespeare is to English literature, then the Fundamental Theorem of Calculus 
is Hamlet or King Lear. Mathematically, it does not get much better than this. It is 
probably fair to say that without the Fundamental Theorem of Calculus, integrals— 
one of the most important mathematical tools in science and technology—would not 
be very usable, and in that case much of modern science and technology would not 
exist. 

There are actually two versions of the Fundamental Theorem of Calculus. The 
two versions are equivalent to each other, and the order in which they are discussed 
does not matter; we will follow the more customary order. Both versions essentially 
say that differentiation and integration are inverse operations. In the first version we 
do integration first and then differentiation, and in the other version we do the reverse. 
In our discussion we will make use of the concept of an antiderivative, as defined in 
Definition 4.4.8; note that antiderivatives are defined strictly in terms of derivatives. 
If we can establish a connection between integrals and antiderivatives, then we will 
have established a connection between integrals and derivatives. 

How are integrals and derivatives related? As defined, these two concepts are 
somewhat hard to compare, because derivatives take functions and yield functions, 
whereas integrals take functions (and closed bounded intervals) and yield numbers. 
One way to compare integrals with derivatives is modify integrals to obtain functions 
from them. More specifically, the idea is to let the “b” in hie f (x) dx vary. That is, we 
want to consider integrals of the form {* f(t) dt, where we think of x as a “variable.” 
(Observe that we wrote f(t) dt in the above integral rather than f(x) dx, to emphasize 
that the symbol x in the integral [ f(t) dt is not the same as the “dummy variable” x 
in the integral [? f(x) dx.) 
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Example 5.6.1. 


(1) Let f: [0,2] > R be defined by f(x) = x for all x € [0,2]. Let F: [0,2] > R 
be defined by 


F(x) = [ fea 


for all x € [0,2]. We want to find an explicit formula for F; recall that we do not 
yet have the Fundamental Theorem of Calculus at our disposal—we are looking at 
this example as motivation for that theorem. Because the function f is so simple, a 
formula for F can be found using some basic geometry. Let x € [0,2]. As seen in 
Figure 5.6.1, the value of F(x) represents the area of the shaded trapezoid when x > 1; 
a similar argument holds when x < 1, and we omit the details. The two bases of the 
trapezoid have lengths | and x, and the height of the trapezoid is x — 1. Using the 
formula for the area of a trapezoid, it follows that 

_ FE i 1 2 1 


F(x) 


It is evident that F is differentiable, and that F’ = f. We have therefore seen a concrete 
example of the Fundamental Theorem of Calculus Version I (Theorem 5.6.2), which 
we will prove after this example. 


1 as 


Fig. 5.6.1. 


(2) Let h: [0,2] — R be defined by 


nay fh fre [0.1] 
= )2, ifxe (1,2). 


By Exercise 5.2.5 we know that h is integrable. Let H: [0,2] — R be defined by 


H(x) = ‘i “h(t) dt 
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for all x € [0,2]. Using Figure 5.6.2, it is left to the reader to verify that 


H(x) = X, sa (0, 1] 
2x—1, ifxe€ [1,2], 


and that H is not differentiable at x = 1. 


Fig. 5.6.2. 


Hence, it is not always the case that a function of the form h(t) dt is differen- 
tiable, and, of course, when (h(t) dt is not differentiable, then it makes no sense to 
assert that its derivative is h. In fact, combining Example 4.4.11 with Exercise 4.2.3, 
we see that the function / is not the derivative of any function. © 


AS we saw in the two parts of Example 5.6.1, sometimes functions of the form 
J f(t) dt are differentiable and sometimes they are not. As we now see in the first 
version of the Fundamental Theorem of Calculus, if a function f: [a,b] — R is contin- 
uous then {-* f(t) dt is differentiable, and it is an antiderivative of f. We note, however, 
that continuity is not necessary for [* f(t) dt to be differentiable; see Exercise 5.6.2. 
In general, a function of the form J” f(t) dt is better behaved than the function f, 
even when f is not continuous; it is seen in Exercise 5.6.5 that if f is integrable, then 
J* f(t) dt is uniformly continuous. 


Theorem 5.6.2 (Fundamental Theorem of Calculus Version I). Let J C R be a 
non-degenerate interval, let a € I and let f : I R be a function. Suppose that f\c is 
integrable for every non-degenerate closed bounded interval C C I. Let F: I R be 
defined by 


F@)= [toa 


for allx € 1. Let c €1. If f is continuous at c, then F is differentiable at c and 
F'(c) = f(c). If f is continuous, then F is differentiable and F' = f. 


Proof. Suppose that f is continuous at c. Suppose further that c is not a right endpoint 
of I. We will show that 
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_ F(c+h)—F(c) 
lim —-—HH = = . 5.6.1 
im, h f(c) (5.6.1) 
A similar argument, the details of which we omit, shows that if c is not a left endpoint 
of J, then 


= f(c). 


These two cases together imply that F’(c) = f(c), taking into account that if c is an 
endpoint of J then a one-sided derivative is used. 

Let € > 0. Because f is continuous at c, there is some 6 > 0 such that w € J and 
|w —c| < 6 imply that | f(w) — f(c)| < 5. Because c is not a right endpoint of J, we 
use the one-sided analog of Lemma 2.3.7 (2) to see that by taking a smaller value 
of 6 if necessary, we may assume that [c,c +6) CJ. Hence w € [c,c+6) implies 
If) — FQ) < &. 

Let h € (0,6). Then t € [c,c +A] implies | f(t) — f(c)| < §. It follows from Theo- 
rem 5.3.1 (4) that 


cth 
if flodt=Ffole+h-) =f(0). 


We now use Corollary 5.5.9, Theorem 5.5.5 and Theorem 5.3.2 (3) to see that 


lated) fol = it pyar [soa] —s0 
_ t | "fdr t | = f(c)dt 
= : | "UF@) — Float < - i “ f(t) — f(c)|at 
< rs (+h c= : <é. 


It then follows from the definition of one-sided limits that Equation 5.6.1 holds. 


Observe that in the statement of the Fundamental Theorem of Calculus Version I 
(Theorem 5.6.2), if the interval J is closed and bounded, then it is sufficient to assume 
that f is integrable, because of Theorem 5.5.6. 

The Fundamental Theorem of Calculus Version I, together with Exercise 3.3.2 (2) 
and Theorem 5.4.11, immediately imply the following important fact. 


Corollary 5.6.3. Let I C R be a non-degenerate interval, and let f: I R be a 
function. If f is continuous, then f has an antiderivative. 


Whereas Corollary 5.6.3 says that in principle every continuous function has an 
antiderivative, it says nothing about how to compute such antiderivatives in practice. 
Indeed, as the reader has seen in calculus courses, computing antiderivatives can be 
quite tricky. In contrast to differentiation, where we have a short list of rules that 
allow us to take the derivative of virtually any function that can be built up out of 
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elementary functions using sums, differences, products, quotients and compositions, 
there is no simple list of techniques for finding antiderivatives of all such functions. 
Some antiderivatives are very difficult to compute, requiring ad hoc techniques, and 
some cannot be expressed at all as nice formulas made up out of elementary functions. 
For example, let f: R — R be defined by f(x) = e- for all x € R. The Fundamental 
Theorem of Calculus Version I implies that the function g: R — R defined by g(x) = 


4¢@ dt for all x € R is an antiderivative for f, but this antiderivative is not very 


useful in practice, and in fact there is no simple formula for this antiderivative, which 
is a great pity, because this particular antiderivative is useful in a variety of application 
of mathematics, for example probability and statistics. See [Kas80] or [MZ94] for 
general discussion and historical remarks about which antiderivatives can be written 
“in finite terms” (which means as an appropriate type of formula involving elementary 
functions), and see [Ros72] for a precise statement and detailed proof of such a result 
(the proof involves ideas from both complex analysis and abstract algebra). We will 
return to the question of finding antiderivatives later in this section, but for now we 
continue with our discussion of the Fundamental Theorem of Calculus. 

In Version I of the Fundamental Theorem of Calculus we first integrated a function 
(albeit with the “variable” x instead of b), and then differentiated the result of the 
integration, and we obtained the function with which we started. In Version I of 
the Fundamental Theorem of Calculus we reverse the order of differentiation and 
integration. The basic idea is to start with a function f: [a,b] — R, take its derivative, 
and then integrate the derivative, which yields fi f' (x) dx. We cannot directly compare 
this integral with the original function, because the integral is a number, and not a 
function, but this problem can be avoided, as suggested by the following intuitive 
idea. 

Suppose that s: [a,b] — R represents the position of an object on the x-axis as 
a function of time. We want to compute the average velocity of the object from 
time ¢ = a to time tf = b. One way to calculate average velocity is to divide the total 
distance by the total time, which yields ve On the other hand, there is a general 
way to calculate the average value of an integrable function g: [a,b] — R, which is 
wo f? g(x) dx. We assume that the reader is informally familiar with this formula 
for average value from a calculus course; it is discussed in detail in Example 8.4.6. 
Hence, we can calculate the average velocity of our position function as a fc s'(t) dt, 
assuming that s’ is integrable. If the world is as nice as one would hope, then these 
two methods of computing average velocity ought to give the same result, which 


means that > c s(t)dt = sone and hence that Hi s'(t) dt = s(b) — s(a). What 
we see is that the integral ii s’(t) dt, which is a number, is related to the function s by 
looking at the values of s at the endpoints [a,b], and that is how we solve the problem 
of comparing numbers and functions in Version II of the Fundamental Theorem of 
Calculus. Indeed, the formula He s'(t) dt = s(b) — s(a) is the same as the Fundamental 
Theorem of Calculus Version II, except that we need to rename s’ by f, and then s 
is an antiderivative of f, which we call F’. Of course, this intuitive argument is not a 


proof, but it lends plausibility. 
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Theorem 5.6.4 (Fundamental Theorem of Calculus Version II). Let [a,b] C R 
be a non-degenerate closed bounded interval, and let f : [a,b] — R be a function. 
Suppose that f is integrable and f has an antiderivative. If F : [a,b] + R is an 
antiderivative of f, then 


[sevar=F) — F(a). 


Proof. Let F: [a,b] — R be an antiderivative of f. Let € > 0. Because f is integrable, 
there is some 6 > 0 such that if P is a partition of [a,b] with ||P|| < 6, and if T isa 
representative set of P, then |S(f,P,T) — f? f(x) dx| < €. Let R= {x0,%1,...,%Xn} be 
a partition of [a,b] with ||R|| < 6, which exists by Exercise 5.2.1. 

Because F is differentiable (its derivative is f), then F is continuous by Theo- 
rem 4.2.4. Let i € {1,...,2}. The Mean Value Theorem (Theorem 4.4.4) applied to 
F\; [x;_1,x;) implies that there is some s; € (x;-1,x;) such that 


F(x) — Fit) 


Xj — Xi-1 


F'(s;) = 


Hence F (x;) — F (x;-1) = F'(s;) (x; —x;-1) = f(s;) (4; -x;_1). Let S= {51,52,...,5n}- 
Then S is a representative set of R. Therefore 


Because S is a representative set of R, then |S(f,R,S) — f? F(x) dx| < €, and it fol- 


lows that |[F(b) — F(a)] — ? f(x) 
Lemma 2.3.10 (3). 


from 


The expression “F (b) — F(a)” in the Fundamental Theorem of Calculus Version 
II (Theorem 5.6.4) is so frequently used in calculations of integrals that it is often 
written in the compact notation “[F (x ee or some variant of that. We will not make 
use of such notation in the statements of theorems or in proofs, though for convenience 
we will use it in some examples. 

The reader might wonder whether the hypotheses of the Fundamental Theorem of 
Calculus Version II (Theorem 5.6.4), which are that the function f is integrable and has 
an antiderivative, are redundant, in that these two criteria might appear to be related. 
However, even though the two versions of the Fundamental Theorem of Calculus 
show a relationship between integration and antidifferentiation, the relationship is not 
as straightforward as one might expect. As we see in Example 5.6.5, a function can 
have an antiderivative and yet not be integrable, and a function can be integrable and 
yet not have an antiderivative. 


Example 5.6.5. 
(1) Let f: R — R be defined by 
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xsint, ifx40 
= xe? 
P) ‘ if x =0. 


It was shown in Example 4.2.5 (1) that f is differentiable. However, it was also noted 
in that example that the values of f’ go to infinity as x goes to 0 from the right, and 
hence f’ is not bounded on any closed bounded interval containing 0 in its interior. 
Therefore f’ cannot be integrable on any closed bounded interval containing 0 in its 
interior by Theorem 5.3.3. Viewed another way, the function f’ has an antiderivative, 
namely, the function f, and yet it is not integrable on any closed bounded interval that 
contains 0 in its interior. 
(2) Let h: [0,2] — R be defined by 


1, ifxe [0,1] 
h(x) = ; 
2, ifxe (1,2). 


The function h is integrable by Exercise 5.2.5. However, as noted in Example 5.6.1 (2), 
the function / is not the derivative of any function, which is another way of saying 
that h does not have an antiderivative. 


We know by Theorem 5.4.11 and Corollary 5.6.3 that a continuous function is 
both integrable and has an antiderivative, and therefore the Fundamental Theorem of 
Calculus Version I applies to any continuous function. Are there functions that are 
both integrable and have antiderivatives, but are not continuous? If not, then we could 
just as well have stated the Fundamental Theorem of Calculus Version II with the 
simpler hypothesis that f is continuous. It turns out, however, as the reader is asked 
to show in Exercise 5.8.7, that there are such functions, which is why we stated the 
hypotheses of the Fundamental Theorem of Calculus Version I as we did. 

Although our initial motivation for Version II of the Fundamental Theorem of 
Calculus was that we wanted to reverse the order of integration and differentiation 
from what was done in Version I, in fact Version II turns out to be by far the more 
useful of the two versions in the applications of calculus. The reader has undoubtedly 
computed many integrals via the Fundamental Theorem of Calculus Version I in 
calculus courses, and so we will provide only one such example, together with an 
example of how not to use the Fundamental Theorem of Calculus. 


Example 5.6.6. 


(1) Let g: [0,2] — R be defined by g(x) = x* for all x € [0,2]. We know by 
Example 3.3.7 (1) that g is continuous. As remarked above, it follows that g satisfies 
the hypotheses of the Fundamental Theorem of Calculus Version II (Theorem 5.6.4). 
Let G: [0,2] — R be defined by G(x) = = for all x € [0,2]. Then G is an antiderivative 
of g, as the reader can easily verify. The Fundamental Theorem of Calculus Version 
II now implies that 
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(2) As useful as the Fundamental Theorem of Calculus Version II is, it is important 
not to get carried away using this wonderful theorem when it is not applicable. When 
faced with the integral fis 5 dx, a student in a calculus course who has just learned 
the Fundamental Theorem of Calculus Version II might try the calculation 


1] 1 1 
zax= | x? dx = -;| - ( 7) ( 1 }= had 
Se = x|_y il —1 


This calculation cannot be correct, because the function given by f(x) = 5 is always 
positive, so its integral cannot be negative by Theorem 5.3.2 (1). The problem is 
that the hypotheses of the Fundamental Theorem of Calculus Version II are not 
satisfied by the function f. Indeed, this function f is not defined on the whole 
interval [—1, 1]. It would be possible to extend f to the interval [—1,1] by giving 
it an arbitrary value at x = 0, but the extended function is not bounded, and hence 
it is not integrable by Theorem 5.3.3; the extended function also does not have an 


1 
antiderivative, because the function given by the formula f(x) = —-—, which is the 
x 


antiderivative on [—1,0) U (0, 1], cannot be extended to a continuous function on 
[—1,1], and therefore it cannot be extended to a differentiable function by Theo- 
rem 4.2.4. Hence, it is crucial to make sure that the Fundamental Theorem of Calculus 
Version II is applicable before trying to use it. 

We will see the proper way of dealing with the integral [ un + dx in Example 6.4.10, 
after we have defined improper integrals. 0) 


In contrast to the proof of the Fundamental Theorem of Calculus Version I (Theo- 
rem 5.6.2), which required results that were proved in Section 5.5, the proof of the 
Fundamental Theorem of Calculus Version I (Theorem 5.6.4) required nothing about 
integrals beyond their definition given in Section 5.2. The Fundamental Theorem of 
Calculus Version II could therefore have been stated and proved in that earlier section, 
but we delayed stating and proving it till now in order to keep both versions of the 
Fundamental Theorem of Calculus together. 

The relation between the two versions of the Fundamental Theorem of Calculus 
can be seen more easily by looking at the case of continuous functions. Although it is 
not required that f be continous in the statement of Version I of the Fundamental 
Theorem of Calculus, as noted above continuous functions always satisfy the hypothe- 
ses of this version. We now see that in the continuous case it is possible to use the 
Fundamental Theorem of Calculus Version I to give a simpler proof of Version II. 
Suppose that f: [a,b] — R is a continuous function, where [a,b] is a non-degenerate 
closed bounded interval, and suppose that F’: [a,b] — R is an antiderivative of f. Let 
G: [a,b] — R be defined by 


Xx 
Gw)= f far 
for all x € [a,b]. Then G is an antiderivative of f by the Fundamental Theorem of 


Calculus Version I, because we are assuming that f is continuous. Lemma 4.4.7 (2) 
implies that there is some C € R such that F (x) = G(x) +C for all x € [a,b]. Then 
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F(b) F(a) =[G(b) +C] — [G(a) +] = | ” Fia)dt— [ "f()at = | ” F(x) de. 


Reflections 


Our two versions of the Fundamental Theorem of Calculus are the same as the 
versions seen in introductory calculus courses, except for the fact that we are now more 
careful with stating their precise hypotheses. In our statement of the Fundamental 
Theorem of Calculus Version I it should be noted that the continuity of f, and the 
corresponding differentiability of F, occur at single numbers, not necessarily on the 
whole interval. Even more interesting is the statement of the Fundamental Theorem 
of Calculus Version II, where it is required that f is integrable and that f has an 
antiderivative. Both of these conditions hold if f is continuous, and that is usually 
assumed when the Fundamental Theorem of Calculus Version II is stated in an 
introductory calculus course. However, it is important to note that for non-continuous 
functions, being integrable is a distinct condition from having an antiderivative, as 
noted in Example 5.6.5, where there is a function that is integrable and yet does not 
have an antiderivative, and there is a function that has an antiderivative and yet is not 
integrable. 

The Fundamental Theorem of Calculus Version II is stated for functions of a 
single variable, but it has generalizations to higher dimensions. Observe that under 
suitable hypotheses on f, this theorem can be restated as f° f' (x) dx = f(b) — f(a). 
In other words, the “total amount” of the derivative of f over the whole interval [a,b] 
can be found by evaluating the function f in an appropriate way on the endpoints of 
the interval. The key to generalizing this result to higher dimensions is to think of the 
endpoints of a non-degenerate closed bounded interval as the boundary of the interval. 
Theorems such as Green’s Theorem, the Divergence Theorem and Stokes’ Theorem, 
which one encounters in a multivariable calculus course, all relate the integral of some 
sort of derivative of a function on a region (in the plane or space) to the integral of the 
function on the boundary of the region (where the boundary is one dimension lower 
than the whole region). Such theorems are direct generalizations of the Fundamental 
Theorem of Calculus Version II. An even more general approach to this matter, which 
includes the above-mentioned theorems as special cases, is the generalized Stokes’ 
Theorem on smooth manifolds; see the classic text [Spi65] for details. 


Exercises 


Exercise 5.6.1. Find the derivative of each of the following functions. 


(1) Let F: [0,5] — R be defined by 


F (x) =| Pat 


for all x € [0,5]. 
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(2) Let G: [1,2] — R be defined by 


G(x) = | “(8 4-7)dt 


for all x € [1,2]. 


Exercise 5.6.2. [Used in Section 5.6.] Find an example of a function f: [a,b] — R 
for some non-degenerate closed bounded interval [a,b] C R such that f is integrable 
but not continuous, and that the function F: (a,b) — R defined by 


#Q= [ fea 


for all x € [a,b] is differentiable. For your function f, does F’ = f? 


Exercise 5.6.3. Let a € (0,00), and let g: [—a,a] — R be a function. Suppose that g 
is integrable. Let G: [—a,a] — R be defined by 


x 


G(x) = J a(t)at 


for all x € [—a,a]. Prove that if g is continuous, then G is differentiable, and G’ (x) = 
g(x) + 9(—x) for all x € [—a, a]. 


Exercise 5.6.4. [Used in Theorem 6.4.11 and Theorem 9.3.6.] Let J C R be a non- 
degenerate interval that has the form [a,b) or [a,ce) for some a,b € R, and let f: 1+ R 
be a function. Suppose that f(x) > 0 for all x € /, and that f|;,,;) is integrable for every 
t € 1. Let F: 1 — R be defined by 


P(x) = [feat 


for all x € J. Prove that F is increasing. 


Exercise 5.6.5. [Used in Section 5.6.] Let [a,b] C R be a closed bounded interval, and 
let f: [a,b] — R be a function. Suppose that f is integrable. Let F: [a,b] — R be 
defined by 


x 
F(x) = | f(t)dt 
for all x € [a,b]. Prove that F is uniformly continuous. 


Exercise 5.6.6. [Used in Theorem 5.7.4.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Suppose that f is integrable 
and f has an antiderivative. Let F : [a,b] — R be an antiderivative of f. Prove that if 
s,t € [a,b], then 


[ro dx = F(t)—F(s). 
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5.7 Computing Antiderivatives 


The Fundamental Theorem of Calculus Version II (Theorem 5.6.4) shows that to 
compute definite integrals, it is necessary to be able to compute antiderivatives. We 
saw the definition of antiderivatives in the latter part of Section 4.4; now that we know 
how important they are, we discuss some methods for computing them. 

Unfortunately, although the concept of an antiderivative is simple to define in 
principle, in practice it is not always easy to compute antiderivatives of compli- 
cated functions. Moreover, as we saw in Example 4.4.11, not every function has an 
antiderivative. By Corollary 5.6.3 we know that every continuous function has an 
antiderivative. Some discontinuous functions also have antiderivatives, as we saw 
in Example 4.2.5 (1), where there is an example of a differentiable function with a 
discontinuous derivative, and therefore that derivative is a discontinuous function with 
an antiderivative. 

If a function has an antiderivative, then it will have more than one, though for- 
tunately such antiderivatives cannot be very different from one another, because 
Corollary 4.4.9 states that any two antiderivatives of a given function defined on an 
open interval differ by a constant. Hence, once we know one antiderivative of a func- 
tion, we know all others by adding constants to the first antiderivative. In other words, 
once we find a single antiderivative, we can obtain the most general antiderivative by 
adding “+C” to the single antiderivative, where C is an arbitrary real number. We 
now give a name to this general antiderivative. 


Definition 5.7.1. Let 7 C R be an open interval, and let f: J — R be a function. 
Suppose that f has an antiderivative. The indefinite integral of f, denoted [ f(x) dx, 
is the most general antiderivative of f. If F is any antiderivative of f, then 


i Fae PGK, 


where C is an arbitrary real number. A 


Given that an indefinite integral is the most general antiderivative, the “+C” in 
indefinite integration is crucial. By contrast, an antiderivative is a single function, and 
does not include the “+ C.” 

It is worth stressing that what we called an “integral” without any adjective prior 
to Definition 5.7.1 is what is referred to in calculus courses as a “definite integral.” 
In general, we will continue to use the word “integral” without an adjective in that 
meaning, and if we mean “indefinite integral” we will refer to it as such, unless it is 
clear from the context that the word “integral” alone means “indefinite integral.” 

The notation [ f(x) dx for indefinite integrals, though very standard, is rather 
unfortunate, in that it looks very similar to the notation ¢ f(x) dx for “definite 
integrals.” It is true that these two types of integrals are ultimately related by the 
Fundamental Theorem of Calculus, but it is important to keep in mind that definite 
integrals and indefinite integrals are very different constructions, and mean very 
different things—that these two concepts are related is a remarkable theorem, not an 
obvious consequence of their definitions. As such, it would be less confusing if these 
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two concepts had names and notations that were not so similar, but we are stuck with 
the standard notation. 

We conclude this section with some basic facts about indefinite integrals, starting 
with the most elementary properties, which are analogous to the corresponding 
properties of derivatives and definite integrals. 


Theorem 5.7.2. Let I C R be an open interval, let f,g: I — R be functions and let 
k ER. Suppose that f and g have antiderivatives. 


1. f +g is has an antiderivative and J |f + g](x)dx = f f(x)dx+ f g(x)dx. 
2. f —g is has an antiderivative and J |f — g|(x)dx = [ f(x) dx— f g(x) dx. 
3. kf has an antiderivative and fk f|(x)dx =k f f(x) dx. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 5.7.1. 


(1) Let F,G: J — R be antiderivatives of f and g, respectively. By Theo- 
rem 4.3.1 (1) we know that [F + G]’ = F’+G' = f+. Hence F +G is an an- 
tiderivative of f + g. It follows that 


[Uftele)ar= FQ) +G(x) +¢, 
where C is an arbitrary constant. We also know that 
[teas =F(x)+D and [sax =G6() “Ez, 


where D and £ are arbitrary constants. However, if D and E are arbitrary constants, 
then so is D+ E, and hence we can write 


[if +sle)ar= F(x) +G(x)+D+E. 
It follows immediately that 


[Utteleiar=f rejar+ f g(x)ax. 


If there is one single most useful technique of integration, it is the one given in 
the following theorem, which is based upon the Chain Rule. 


Theorem 5.7.3 (Integration by Substitution). Let I,J CR be open intervals, and 
let g: 1— J and f: J > R be functions. Suppose that g is differentiable. If F: J +R 
is an antiderivative of f, then 


[ Fe)e' dx =F(e(a) +C 
forallx ET. 


Proof. By definition we know that F is differentiable and F’ = f. Hence, by the 
Chain Rule (Theorem 4.3.3) we deduce that F og is differentiable and [F 0 g|'(x) = 
F'(g(x)) -g'(x) = f(g(x)) -g’(x) for all x € J. Therefore F o g is an antiderivative of 
(fog). 
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The reader might recall from a calculus course a slightly different formulation of 
Integration by Substitution than the one we have given in Theorem 5.7.3. The calculus 
course version is usually stated as 


[Pe)s @ax= | Fu)du 


where the substitution wu = g(x) is used. We do not use that formulation in Theo- 
rem 5.7.3, both because the formulation that we used is more in keeping with our 
discussion up till now, and because we have not dealt with “dx” other than as a formal 
symbol. The calculus course version, which is equivalent to the one we use, can be 
made rigorous if one deals properly with “dx” and “du,” though doing so would take 
us too far afield. The symbols “dx” and “du are examples of differential forms; see 
[Spi65, “Fields and Forms”] for a rigorous treatment of this subject. 

Theorem 5.7.3 is for indefinite integrals. There is also a definite integral version 
of Integration by Substitution, which we now state and prove. Although to students 
in a calculus course it might appear that the definite integral version of Integration 
by Substitution is simply the result of putting the “a” and “b” on the integrals in the 
indefinite integral version, it is not rigorous to use “proof by similarity of notation.” 
In fact, because integrability and antidifferentiability are not the same, as we saw in 
Example 5.6.5, the following theorem is proved quite differently from Theorem 5.7.3, 
though both theorems ultimately rely upon the Chain Rule. 


Theorem 5.7.4 (Integration by Substitution for Definite Integrals). Ler [a,b], 
[c,d] C R be non-degenerate closed bounded intervals, and let g: [a,b] — (c,d] and 
f: [c,d] — R be functions. Suppose that f is continuous, that g is differentiable and 
that g’ is integrable. Then (f © g)g’ is integrable and 


g(b) 
[ f(g x)dx = f(x) dx 
g(a) 
Proof. By Theorem 4.2.4 we know that g is continuous, and hence fo g is continuous 
by Theorem 3.3.8 (3). It follows from Theorem 5.4.11 that fo g is integrable, and 
therefore (f 0 g)g’ is integrable by Theorem 5.5.4 (2). 
Let F: [c,d] — R be defined by 


=f fear 


for all x € [c,d]. The Fundamental Theorem of Calculus Version I (Theorem 5.6.2) 
implies that F is differentiable and F’ = f. By the Chain Rule (Theorem 4.3.3) we 
see that [F og]' = (F’og)9’ = (fog)g’. Hence [F og)’ is integrable by the previous 
paragraph. Observe also that [F o g]’ has an antiderivative, which is F og. Because f 
is continuous then f is integrable by Theorem 5.4.11, and we also know that f has 
an antiderivative, which is F. By the Fundamental Theorem of Calculus Version II 
(Theorem 5.6.4) applied to [F 0 g]', and Exercise 5.6.6 applied to f, we see that 


b 
[rece ar [oe ae=F(@()—Flela)) = [” foas 


a g(a) 
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Here is another very useful method for computing antiderivatives, this one based 
upon the Product Rule. Once again, we have both an indefinite integral version and a 
definite integral version. 


Theorem 5.7.5 (Integration by Parts). Let J C R be an open interval, and let 
f,g: 1 R be functions. Suppose that f and g are continuously differentiable. Then 
f'g and fg' have antiderivatives and 


[feds ax=s@)a() - [ Fe 


Proof. Because f and g are differentiable, then they are continuous by Theorem 4.2.4. 
Because f and g are continuously differentiable, then by definition we know that 
f’ and g’ are continuous. It follows from Theorem 3.3.5 (4) that f’g and fg’ are 
continuous, and hence by Corollary 5.6.3 we know that f’g and fg’ have antideriva- 
tives. By the Product Rule (Theorem 4.3.1 (4)) we know that fg is differentiable and 
(fg]’ = f’g+fg'. Therefore fg is an antiderivative of f’g + fg’, and hence 


forallx ET. 


/ [f’ (x)e(x) + f(x)a’()] dx = f(x)g(x) +. 
Using Theorem 5.7.2 (1) we deduce that 


[sede ar= ses) — [1 @si)ar+e, 


However, we can now drop the “+ C,” because it is included in the indefinite integral 
on each side of the equals sign. 


Theorem 5.7.6 (Integration by Parts for Definite Integrals). Let [a,b] C R be 
a non-degenerate closed bounded interval, and let f,g: |a,b| — R be functions. 
Suppose that f and g are differentiable, and that f' and g' are integrable. Then f'g 
and f g' are integrable and 


b b 
J feds @ax=[FO)e) -s@ata)— | fata) ax 
a a 
Proof. Because f and g are differentiable, then they are continuous by Theorem 4.2.4, 
and hence they are integrable by Theorem 5.4.11. It follows from Theorem 5.5.4 (2) 


that f’g and fg’ are integrable. By the Product Rule (Theorem 4.3.1 (4)) we know 
that [fg]’ = f’g + fg’. Using Theorem 5.3.1 (1) we see that 


b b 
[ eiewttax= [Le eG) + Fes lax 


= [reoeact [roe 
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Because fg is an antiderivative of [fg]’, the Fundamental Theorem of Calculus Version 
II (Theorem 5.6.4) implies that 


f(0)e(6)—Fla)ala) = [eatae+ [ pooe' ae 


and the desired result follows immediately. 


The reader might be more familiar with the formulation of Integration by Parts 
that is written fudv = uv — {vdu, rather than the formulation we used in Theo- 
rem 5.7.5. These two formulations of Integration by Parts are equivalent, but we 
chose the formulation we used for the same reasons that we chose our formulation of 
Integration by Substitution in Theorem 5.7.3. For the sake of familiarity and computa- 
tional simplicity, however, we will feel free to use the more common formulation of 
Integration by Parts using du and dv in a few computations, for example in the proof 
of Theorem 7.4.4, which says that the area of a circle of radius r is ar. 

There are other techniques of integration in addition to Integration by Substitution 
and Integration by Parts, for example partial fractions and trigonometric substitution. 
Each of these techniques is useful for a particular type of indefinite integral, and by 
using all of these techniques it is possible to evaluate many, though certainly not all, of 
the integrals one encounters in applications. These techniques of integration are also 
used by various computer algebra systems, which can evaluate many integrals very 
effectively. We will not give the details of any of these other techniques of integration 
here. See [BML, Section II.11] for a treatment of the algebra that underlies partial 
fractions. 


Reflections 


Although we defined the concept of antiderivatives in Section 4.4, we did not 
provide a notation for this concept at the time, and instead waited to define the notation 
J f(x) dx for indefinite integrals until the present section, which is after the notation 
for definite integrals was defined, lest the reader mistakenly think that definite integrals 
are defined in terms of indefinite integrals. As previously observed, the similarity of 
the notations for indefinite integrals and for definite integrals causes some confusion, 
because these two types of integrals are very different conceptually, and are related 
to one another not by definition, but by the Fundamental Theorem of Calculus (both 
versions). Though unfortunate, we are stuck with this notation for the two types of 
integrals for historical reasons. Somewhat surprisingly, the notation for indefinite 
integrals (the less fundamental concept) was not created by taking the notation for 
definite integrals (the more fundamental concept) and removing the “a” and “b,” but 
rather the other way around. The notation for indefinite integrals was due to Leibniz, 
right at the birth of calculus, and it proved very useful (especially in comparison with 
Newton’s notation, which did not include the “dx,” and which therefore did not work 
as nicely with substitutions); the notation for definite integrals was due to Fourier 
over a century after Leibniz. 

In addition to the difference in meaning between the notation f f(x) dx and the 


6699 


notation aig f(x) dx, another important difference between the two is that the “x” in 
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the latter is a “dummy variable” whereas the x in the former is not. That is, if we 
change x to u in Hig f(x) dx we obtain f? f(u) du, which has the same numerical value 
as ¢ f(x) dx. By contrast, the indefinite integral [ f(x) dx is a function with “variable” 
x, whereas f f(u)du is a function with “variable” u, and hence these two functions, 
though related, are not identical. Of course, the expressions { f(x) dx and f f(u)du 
are equally meaningful; the practice of some introductory calculus texts of writing 
their tables of integrals (which are indefinite integrals) with u rather than x, as if that 
somehow makes the table of integrals more general, is quite silly, because a formula 
for f f(x) dx and a formula for { f(u) du tell us the same information. 


Exercises 


Exercise 5.7.1. [Used in Theorem 5.7.2.] Prove Theorem 5.7.2 (2) (3). 


Exercise 5.7.2. Prove that there do not exist numbers a,b € R such that [[fg](x)dx = 
ag(x) f f(x) dx + bf (x) f g(x) dx for all functions f,g: R — R such that f, g and fg 
have antiderivatives. 


Exercise 5.7.3. Find an example of functions f,g: R— R such that f and g do not 
have antiderivatives, but that fg has an antiderivative. 


Exercise 5.7.4. Explain the flaw in the following attempted “proof” that 0 = 1: “Using 
Integration by Parts (Theorem 5.7.5) with u = i and dv = dx, we see that 


1 1 1 1 1 
0+ f—ar= f dx=—-x [( 5 )xdx= 1+ f Zab. 
x x x x x 


By canceling we deduce that 0 = 1.” 


Exercise 5.7.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is differentiable, that f’ is integrable 
and that f is injective. We view f as a function f: [a,b] — f((a,b]), and hence 
f-!: f({a,b]) — [a,b] exists. Prove that 


b f(b) 
| flaast fade = bf (0) ~af (a) 
a fla 
Make sure to verify that the hypotheses for any theorems used are satisfied. 
[Use Exercise 4.5.11 and Exercise 4.6.3 (2).] 
Exercise 5.7.6. [Used in Exercise 9.3.11 and Exercise 10.4.10.] 


(1) Let p € NU {0}, and let ao,a2,...,ap,b0,b2,...,bp € R. Prove that if u,r,s € 
{0,...,p} andu<r<s, then 


Ss Ss r—1 s—l i 
Vana (x s) — a; (x s) — ¥ (ai41 — a) (x s) ; 
k=u k=u k=u 


i=r i=r 


where any summation of the form ae with c > d is taken to be zero. 
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(2) The formula in Part (1) of this exercise is known as Abel’s Formula.Of which 
theorem in this section is Abel’s Formula the discrete analog? 


Exercise 5.7.7. This exercise discusses the beta function. 
Let B: [1,-°) x [1,ec) — R be defined by 


1 
Bony) = fo e1( 1p 


for all x,y € [1,¢9). 


(1) Prove that B(y,x) = B(x,y) for all x,y € [1,°°). 
(2) Prove that B(x,y) = "B(x—1,y+1) for all x € [2,00) and y € [1,-). 


(3) Prove that if n,m € N, then B(n,m) = ona 
with the gamma function, as discussed in Exercise 6.4.13, then it will be 


observed that this equality can be rephrased as B(n,m) = aa Le for all 


n,m €N. In fact, it can be verified that B(x,y) = ST for all x,y € [1,00); 
see [Wad00, Theorem 12.69(1)] for a proof.) 


(If the reader is familiar 


Exercise 5.7.8. The purpose of this exercise is to provide an alternative version of 
Taylor’s Theorem (Theorem 4.4.6). Let [a,b] C IR be a non-degenerate closed bounded 
interval, let c € (a,b), let f: [a,b] — R be a function and let n € NU {0}. Suppose 
that f() exists and is continuous on [a,b] for each k € {0,...,n+1}. Let x € [a,b]. 


Prove that (on) 
joy -o+ [PPO —1)"dt. 


k=0 
Do not try to deduce this result from Taylor’s Theorem (Theorem 4.4.6); prove it on 
its own, by induction on n. 
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We saw in Theorem 5.4.11 that all continuous functions are integrable. Non-continu- 
ous functions may or may not be integrable; the functions in Example 5.2.6 (2) (4) 
are not continuous and are integrable, and the function in Example 5.2.6 (3) is not 
continuous and not integrable. Intuitively, for a function to be integrable, it can be 
discontinuous, but the set of numbers where the function is discontinuous cannot be 
“too large.” This fact can be made precise, as we will see in Theorem 5.8.5 below, 
known as Lebesgue’s Theorem. We start with some preliminaries, beginning with a 
definition that characterizes what we mean when we say that a set that is “not too 
large” from the point of view of integration. Our treatment of this material follows 
[Sto01, Section 6.7]. 

To understand the material in this section, the reader must be familiar with 
basic properties of countable and uncountable sets, and with the cardinalities of the 
standard sets of numbers, such as the rational numbers and the real numbers; see 
[Blo10, Sections 6.5—6.7] for details. 
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If the functions in Example 5.2.6 (2) (4) are compared with the function in 
Example 5.2.6 (3), the reader will notice that the sets of numbers at which the 
functions are discontinuous in Example 5.2.6 (2) (4) are countable (recall that a finite 
set is considered countable, and the rational numbers are countable), whereas the set 
of numbers at which the function is discontinuous in Example 5.2.6 (3) is uncountable 
(any non-degenerate interval in R is uncountable). It would be tempting to conjecture 
from these examples that a function is integrable if and only if the set of numbers 
at which it is discontinuous is countable, but such a conjecture turns out to be false, 
as the reader is asked to show in Exercise 8.4.11. The issue of countability versus 
uncountability is not the correct way to characterize sets that are “not too large” from 
the point of view of integration. 

Rather than looking at the cardinality of sets, we need to look at the “size” of 
subsets of R. Although a complete characterization of such size, known as Lebesgue 
measure, would take us too far afield, for our present purpose we need to characterize 
only those subsets of R that have “size zero,’ known formally as “measure zero,” to 
be defined below. See [Str00, Chapter 14] for a discussion of Lebesgue measure. 

As with many other aspects of calculus, the basic idea in determining the size of 
subsets of R is to approximate more complicated subsets with simpler ones, and in 
particular with sets for which it is easier to compute the size. The subsets of R for 
which size is particularly easy to compute are bounded intervals. For any bounded 
interval, which has the form (a,b), [a,b), (a,b] or [a,b], its length is b — a. For our 
present purposes, it turns out that we can restrict our attention to open bounded 
intervals. The essential idea of a set having “measure zero” is that, rather than dealing 
with zero directly, the set is shown to be “smaller” than any given € > 0, where 
“smaller” is measured in terms of being a subset of an appropriate collection of open 
intervals, the lengths of which add up to less than €. 

Whatever “measure zero” is defined to be, certainly any finite subset of R should 
have measure zero. Clearly, for any finite set, no matter how large, we can always find 
a finite collection of open intervals in R that contains the finite set, and that the sum of 
the lengths of the open intervals is less than any given positive number. It is this last 
fact that implies that every finite set has measure zero. More generally, any subset of 
R that has this property, namely, that for any € > 0, there is a finite collection of open 
intervals that contains the subset, and the sum of the lengths of the open intervals is 
less than €, should be considered as having measure zero. However, and this is the 
subtle insight of Lebesgue’s Theorem, it turns out that to characterize sets as having 
measure zero, it does not suffice to consider only finite collections of open intervals, 
but rather countable collections of open intervals as well. 

A countable collection of open intervals in R can be viewed as a sequence of open 
intervals, and hence, for the following definition, we need the notions of sequences and 
series. These concepts will be discussed in detail in Sections 8.2 and 9.2, respectively, 
but we assume that the reader is familiar with these concepts informally, and that the 
reader will agree to take on faith whatever facts we need about sequences and series 
for now until we deal with them rigorously later on, and in that way we can examine 
Lebesgue’s Theorem in the chapter on integration, where it belongs. 
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Definition 5.8.1. Let A C R be a set. The set A has measure zero if for each € > 0, 
there is a sequence {(a,,b,)};,, of open bounded intervals in R such that A C 
Una1 (4n,bn) and Yr 1 (bn — an) < €. ras 


We note that the open intervals of the form (a,,,b,) used in Definition 5.8.1 are 
allowed to be degenerate, that is, we allow a, = b,, which means that such intervals 
are empty. Hence, although we use sequences of the form {(a,,b,)};_), in those 
cases where all but finitely many of these open intervals are degenerate, then we 
really have finite collections of non-degenerate open intervals. We have phrased the 
definition as we did to avoid having to consider two separate cases, namely, finite 
collections of non-degenerate open intervals and sequences of non-degenerate open 
intervals. 


Example 5.8.2. 


(1) The empty set has measure zero. Let € > 0. Let (a),b1) = (0, 5), and 
let (an,bn) = (0,0) for all n € N such that n > 1. Then @ C UF) (an, bn) and 
Lani (On —Gn) = § <E. 

(2) Any finite subset of IR has measure zero. The reader is asked to prove this fact 
in Exercise 5.8.1. 

(3) A non-degenerate interval does not have measure zero. Let J C R be a non- 
degenerate interval. We consider the case where / is a closed bounded interval; the 
case where / is not a closed bounded interval is left to the reader in Exercise 5.8.2. 

Because we are assuming that J is a non-degenerate closed bounded interval, 
then J = [c,d] for some c,d € R such that c < d. Suppose that [c,d] has measure 
zero. Then there is a sequence {(dn,bn)},_; of open bounded intervals in R such 
that [c,d] C UF) (an, bn) and Y"_, (bn — an) < d—c. By the Heine—Borel Theorem 
(Theorem 2.6.14) there aren € N and i1,i2,...,in € N such that [c,d] CUf_, (ai sBe) 
It follows from Exercise 2.5.15 that 


(bn —an) <d—c, 


Ms 


n 
d—c<¥ (by —ai,) < 
k=1 n 


ll 
A 


which is a contradiction. Therefore J does not have measure zero. (The Heine—Borel 
Theorem might seem stronger than necessary to prove this result, but it allowed us to 
avoid dealing with series more than absolutely necessary.) 

(4) Any countable subset of R has measure zero. Let C C R be a countable set. 
By definition a countable set is either finite or countably infinite. Finite sets were 
treated in Part (2) of this example, so suppose that C is countably infinite. That is, 
suppose that C = {c1,c2,c3,...}, where c1,c2,... C R. Let € > 0. For eachn € N, 


let (an,bn) = (en- gare Cnt siz). Then cy € (an,bn) for all n € N, and hence 


A CUR] (an, bn). Because by — an = sat for all n € N, then 


. = Lt. 1 
La @n) = Loe =e(; | state): 


n=1 n=1 


The series i + : + + +--+ is a geometric series. The reader is likely to be familiar 
with geometric series from calculus courses or before; such series will be treated 
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rigorously in Example 9.2.4 (4). From that example it is seen that i + é + tk ++: 
is convergent, and that 5 + é + ty fees 5. It follows that Y"_ | (bn — an) = 5 5 a oe 
Hence A has measure zero. 

(5) There are uncountable subsets of R that have measure zero, though it is hard 
to imagine such sets intuitively. A famous example of an uncountable subset of R that 
has measure zero is the Cantor set, which will be discussed in detail in Example 8.4.9, 
when we have more tools involving sequences at our disposal. ?) 


It is evident that if a subset A C R has measure zero, and if B C A, then B has 
measure zero; we omit the details. A less evident, though very useful, fact about sets 
of measure zero is given in the following lemma. 


Lemma 5.8.3. Let {A,};_, be a sequence of subsets of R. Suppose that A, has 
measure zero for alln € N. Then U;_, An has measure zero. 


Proof. Let € > 0. Letn € N. Then there is a sequence { (aj,b?) ar of open bounded 
intervals in R such that An C Ug, (a7, b7) and De, (bf — aft) < wit. We saw in 
Example 5.8.2 (4) that Yr ss = 

Let f: N— N x N bea bijective function. Such a function exists because N x N 
is countably infinite, which is a standard fact about the cardinality of the number 
systems; see [Blo10, Sections 6.5-6.7] for details. Let {(cn,dn)};, be defined as 
follows. For each i € N, we have f(i) = (n;,k;) for some n;,k; € N, and then let c; = ay 
and d; = bj. Because f is surjective, it follows that U7, An C UF} (Cn, dn). We now 
use Exercise 9.3.6 to deduce that Y""_, (dn — cn) is convergent and Y""_) (dn — Cn) < 
Veei gat = 3 <E- 


NIM 


Although Lemma 5.8.3 is stated in terms of sequence of subsets of R of measure 
zero, the lemma also applies to any finite collection of subsets of measure zero. 
If Aj,A2,...,Ap is a finite collection of subsets of IR of measure zero, we can let 
A; = {0} for all i € N such that i > p, and then {A,};"_, satisfies the hypotheses of 
the lemma, and so U7; An has measure zero, and hence UP , An has measure zero. 

The following lemma, which relates integration and sets of measure zero, is the 
first step toward Lebesgue’s Theorem. 


Lemma 5.8.4. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f : [a,b] — R be a function. Suppose that f is integrable, that f(x) > 0 for all 
x € [a,b] and that Ai f(x) dx = 0. Then the set {x € [a,b] | f(x) > 0} has measure 
zero. 
Proof. Let E = {x € [a,b] | f(x) > 0}. Foreachn EN, let E, = {x € [a,b] | f(x) > 7}. 
Clearly E, C E for all n € N. Let x € E. Then x € [a,b] and f(x) > 0. By Corol- 
lary 2.6.8 (2) there is some m € N such that i. < f(x). Hence x € E,,. It follows that 
E =U;_1 En. We will show that for each n € N, the set E,, has measure zero. It will 
then follow from Lemma 5.8.3 that E has measure zero. 

Let n € N. Let € > 0. Because 2 f(x)dx = 0, there is some 6 > 0 such that if 
P is a partition of [a,b] with ||P|| < 6, and if T is a representative set of P, then 
SET) —O0| <3: 
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Let R = {x0,1,---,X,} be a partition of [a,b] such that ||R|| < 6, which exists 
by Exercise 5.2.1. LetV = {i € {1,...,2} | ExO [xi_-1,xi] 4 O}. It is evident from the 
definition of V that 


2S | e-al Cl Gena) (x aot ae) 


icV ieV j=0 


Let U = {u1,...,uUn} be the representative set of R defined as follows. If i € V, 
let u; be an arbitrary element of E, 1 [x;-1,xi]; if i € {1,...,n}—V, let u; be an 
arbitrary element of [x;-1,x;]. Then f(u;) > i for all ic V, and f(u;) > 0 for all 
i€ {1,...,2}—V. Then 


£Y (a1) $Y u(r —ms) < Lf uv 1) 
icV ieV i=) 
=|S(f,8,U)-0< =. 


It follows that Yicy (xj —xj-1) < §. It is straightforward to verify that 


male ao) (» rey la 
Therefore 


Ler-a+¥ [(s-gey) - (sagen) | <5 +9 -* 


We deduce that £,, has measure zero. 


Lemma 5.8.4 is interesting precisely because there are discontinuous functions 
that are integrable; in Exercise 5.5.7 it was seen that if the function f in Lemma 5.8.4 
is continuous, then {x € [a,b] | f(x) > 0} is the empty set, which has measure zero as 
noted in Example 5.8.2 (1). 

We are now ready to prove the following remarkable characterization of integrable 
functions. 


Theorem 5.8.5 (Lebesgue’s Theorem). Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f : [a,b] — R be a function. Then f is integrable if and only 
if f is bounded and the set of numbers at which f is discontinuous has measure zero. 


Proof. Suppose that f is integrable. Then by Theorem 5.3.3 we know that f is 
bounded. 

Let {Q,},"_, be the sequence of partitions of [a,b] defined as follows. For each 
k EN, it follows from Theorem 5.4.7 (c) that there is a partition Q, of [a,b] such 
that U(f,Qx) —L(f,Qx) < i‘ there is more than one such partition Q;, so we choose 
one. Let {P,}"_, be the sequence of partitions of [a,b] defined by Py = UK, Q; for 
alk EN. 
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Let n €N. Then P, is a refinement of both Q, and P,_1. It follows that if m <€N 
and m >n, then P, is a refinement of P,; the proof of this fact requires induction, 
and the details are left to the reader. By Lemma 5.4.6 (2) we see that L(f,Q,) < 
L(f Pa) < U(f,Pn) < U(f,Qn), and hence U(f,P,) — L(f,P:) < +. We denote the 
elements of P, by Py, = {xj,X7,---, Wy 

Let un,l,: [a,b] + R be defined by 


a= i if x € (x?_,,x”) for some i € {1,..., Pn} 
f(x), if x € P, 


and 


pe mi" (f), if x € (x?_,,x?) for some i € {1,..., pn} 
‘ f(x), ifxeP,. 


Then uw, and /, are step functions, as defined in Exercise 5.3.4 (2). 

It is seen from the definition of uy, and 1, that /,( a f(x) <un(x) for all x € [a,b]. 
Let i € {1,..., pn}. We can compute M? "(u,) and m;" (In), sae step functions are 
bounded. Observe that /,(x_,) = f(xt_,) = n(x ) and J, (x?) = f(x!) = un(x?), 
and that if x € (x?_,,x7), then J,(x) = mi”(f) and u(x) = M?"(f). It follows that 
mi" (In) = min{ f (x"_,), mi" (f), f(a)} = mi"(f), and similarly Mj" (un) = Mj" (f). 
We deduce that 


Pn Pn 
U (un, Pn =) ML (Un) (Xj =H = i YM" (f(x — 374-1) = U(f,Pn), 


j=l j=l 


and similarly L(/,,P,) = L(f,P,). 

Let m € N. Suppose that m > n. Let x € [a,b] — Pn. Then x € (x”"_,,x7") for some 
t € {1,..., Pm}. Because P,, is a subdivision of P,, then Es re ee Cc [ata] for 
some h € {1,..., pn}. It then follows from Exercise 2.6.1 (1) that um(x) =M)"(f) = 
lub f ( [2x2"_,,x x") < lob (| 447) Mir (f) = un(x). A similar argument shows 
that J,(x) < Ina( ). 

Let s,t € N. Let y € [a,b]. We will show that /;(y) < u;(y). First, suppose that 
s =t. Then we have already noted that /;(y) < u;(y). Second, suppose that s > rt. If 
y € [a,b] — P,, then combining various facts mentioned above we see that /,(y) < 
us(y) <u;(y); if y € Py, then /;(y) = f(y) < u(y). Third, suppose that s < t. This case 
is similar to the previous case, and we omit the details. 

Let u,/: [a,b] — R be defined as follows. Let x € [a,b]. Let 


Ly = {In(x) | 2 EN} and Uy = {un(x) |n € N}. 


Clearly £, and U, are both non-empty subsets of IR. By the previous paragraph, 
we know that £, and 7, satisfy the hypotheses of Part (1) of the No Gap Lemma 
(Lemma 2.6.6), and hence £, has a least upper bound and 7, has a greatest lower 
bound, and lub £, < glb U,. Let u(x) = glb U, and I(x) = glb £,. We have therefore 
defined the functions u and /, and we see that /(x) < u(x) for all x € [a, D]. 


5.8 Lebesgue’s Theorem 289 


We now show that u and / are integrable, and that [ id u(x)dx = Hs f(x)dx = 
f? U(x) dx. First, observe that 1) (x) <1 (x) < u(x) < uy (x) for all x € [a,b], and that /; 
and uw; are bounded. It follows from Exercise 3.2.14 that u oad l are bounded. Let 
€>0. By ecu: 2.6.8 (2) there is some m € N such that + ~ <é€. Then U(f, Pm) — 
L(f,Pm) < + <€. We know that I(x) < I(x) < u(x) < we ) for all x € [a,b], and 
then by Exercise 5.4.6 and Lemma 5.4.6 (1) we deduce that L(lj,Pn) < L(1,Pn) < 
U(1, Pn) <U(um,Pn). We saw above that L(/m,Pn) = L(f,Pn) and U(um,Pn) = 
U(f,Pn), and hence L(f, Pn) < L(1,Pn) <U(1,Pn) <U(f,Pn). A similar argument 
shows that L(f,Pn) < L(u,Pn) < U(u,Pn) < U(f,Pn). It now follows from Exer- 
cise 5.4.11 that uw and / are integrable, and that 2 u(x) dx = es f(x) dx = f? U(x) dx 
By Theorem 5.3.1 (2) we know that u—/ is integrable, and that f?[u —I](x)dx = 
f?u(x)dx— [2 U(x)dx = 0. We also know that (u —/)(x) > 0 for all x € [a,b]. It now 
follows from Lemma 5.8.4 that the set {x € [a,b] | (u—/)(x) > 0} has measure zero. 
Let 


F = {xé [a,b] | (w—1)(x) > O}UL Bm. 
n=1 
For eachn EN, the set P, is finite, and hence it has measure zero by Example 5.8.2 (2). 
It follows from Lemma 5.8.3 that F has measure zero. 

We now show that f is continuous at all numbers in [a,b] — F. Let c € [a,b] —F 
Let € > 0. By the definition of F we know that (u—/)(c) = 0, which means that 
u(c) =[(c). Because /(c) = lub £,, we can use Lemma 2.6.5 (1) to deduce that there 
is some v € N such that /(c) — § </,(c) < I(c). Similarly, there is some k € N such 
that u(c) < uz(c) < u(c) + §. Suppose that v > k; the case where v <k is similar, 
and we omit the details. Because c ¢ U?_, P, then c € [a,b] — P,, and, as noted 
previously, it follows that u(c) < u,(c) < ug(c). Because /(c) = u(c), we deduce that 
u(c)— § <h(c) <u(c) < u(c) < u(c) + §. Therefore u,(c) —1,(c) < 

Because c € [a,b] — P,, it follows that c € (x}_,,x?) for some i € ,. ., pv}. By 
Lemma 2.3.7 (2) there is some 5 > 0 such that (ec —6,c +6) C (x¥_,,x?). 

Suppose that x € [a,b] and |x—c| < 6. Then x € (x?_,,x!), and it follows from the 
definition of u, and /, as step functions that u(x) = u,(c) and 1,(x) = 1,(c). We know 
that 1,(c) < f(c) < u,(c) and 1,(x) < f(x) < u(x), and hence J,(c) < f(x) < u(c). 
It follows that | f(x) — f(c)| < u,(c) —h,(c) < €. We deduce that f is continuous at c. 

Let G be the set of numbers at which f is discontinuous. Because f is continuous 
at all numbers in [a,b] — F, it follows that G C F. Because F has measure zero, then 
G has measure zero, and that completes this part of the proof. 

Now suppose that f is bounded and that the set of numbers at which f is discon- 
tinuous has measure zero. 

Let € > 0. Because f is bounded, there is some M € R such that | f(x)| < M for 
all x € [a,b]; we may assume that M > 0. Let D be the set of numbers at which f is 
discontinuous. Then D has measure zero. Hence there is a sequence {(dn,bn)};_, of 
open bounded intervals in R such that D C Ur) (dn, bn) and Yr, (bn — an) < Fy- 

Let x € [a,b] —D. (We note that [a,b] — D £ 0, because [a, b] does not have measure 
zero by Example 5.8.2 (3), and hence D ¥ |a,b].) Then f is continuous at x. Hence 
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there is some 6, > 0 such that y € [a,b] and |y —x| < 6, imply | f(y) — f(x)| < ba)" 


Suppose that y,z € [a,b] N (x— 6,,x+ 6,). Then |y —x| < 6, and |z—x| < 6,, and it 
follows that 


F(Z) — FON = IF) — Fe) + F(®) — FON S IF@) — FO) +1F@) — FO) 


E E é 


= 4(b—a) | 4(b—a)  2(b—a) 


Because D C Uy) (dn, bn), then 


[a,b] = (a,b] —D) UDC U (x-Fr+S) U U (Gn, bn). 
—D n=1 


x€[a,b] 


The Heine—Borel Theorem (Theorem 2.6.14) implies that there are p,q € N, and 
X1,X2,..+,Xp € [a,b] — D, and nj,nz,...,ng € N such that 


7 
as}CU (a+) YU leaPn) 


Let 
on 1 Ox Pp by 


Pp 
Xp 


Q= {a,b,x; 5 


ny Pry s Ang sOngs+++sAngs Png }- 
Let P = Qn [a,b]. Arrange the elements of P in increasing order, and rename them 
YO,Y15-++;¥n- Then P = {yo,y1,---,Yn} is a partition of [a,b]. Leti € {1,...,n}. Then 
oy, Sy, 
(yi-1,Yi) is a subset of at least one interval of the form { x; — =+,x; + + } or of the 
form (anes Pn, ) Let 
V ={i€ {1,...,n} | Wi-1, 1) C (any, bn,) for some k € {1,...,g}}, 
and let W = {1,...,n}—V. 
5. oy. 
Let kK € W. Then (yx-1, yx) © ( xj-— 4+ *) for some j € {1,...,p}. It 


follows that [yx—1,y%] C (xj — 6;,4) + 6,). If z,w © [ye-1,ye], then z,w € [a,b] 
(x; — 6:,,x; + 6y,), and, as we saw above, it follows that | f(z) — f(w)| < Cae By 
Exercise 5.4.9 (3) we see that M;(f) —m,(f) < 5,5. Therefore 


— 2(b—a) 
YM) —miiNlor-vi-) $ sy HE v-Yi-) < Tow Yr) 
E E 
= 76a PO = 3 


Let r € V. Because | f(x)| < M for all x € [a,b], it follows from Exercise 5.4.9 (4) 
that M,(f) —m,(f) < 2M. Therefore 
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Yi IMi(f) —mi(f)]Qi-—yi-1) < 2M Yi —yi-1) S 2M Y" (bn — an) < 2M 3 
ieV iceV n=l 
Putting the above calculations together we conclude that 


n 


Yi [Mi(f) —mi(f)\(%i — x1) 
= VP IMi(f) — mi(f)] i —xi-1) + Y (Mia) — mi f)] (07 — 1-1) 


icV icW 


l| 


U(f,P) =P) 


ee ae 
pa ne 


Therefore f satisfies the criterion given in Theorem 5.4.7 (c), and hence f is integrable. 


The following corollary is an immediate consequence of Lebesgue’s Theorem 
(Theorem 5.8.5) combined with Example 5.8.2 (4). 


Corollary 5.8.6. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. If f is bounded and is discontinuous at countably many 
numbers, then f is integrable. 


Corollary 5.8.6 immediately implies that the function in Example 5.2.6 (4) is inte- 
grable, though of course proving this corollary, which requires Lebesgue’s Theorem 
(Theorem 5.8.5), requires more work than going through the details of that example, 
so no real effort has been saved. However, from now on, we can easily treat any 
similar such examples. Moreover, Lebesgue’s Theorem can be used to give alternative, 
and often simpler, proofs of various theorems that we have already seen, for example 
Theorem 5.5.1 (1) and Theorem 5.5.4 (2); the reader is asked to provide such proofs 
in Exercise 5.8.4 and Exercise 5.8.5, respectively. 

Finally, we mention that Lebesgue’s Theorem refers to Riemann integration only, 
not to other types of integration. For example, although Lebesgue integration agrees 
with Riemann integration for continuous functions, there are some very discontinuous 
functions that are Lebesgue integrable even though they are not Riemann integrable, 
for example the function given in Example 5.2.6 (3). See [Str00, Chapter 14] for a 
discussion of Lebesgue integration. We will not refer to Lebesgue integration further 
in this text. 


Reflections 


The proof of Lebesgue’s Theorem is the lengthiest, and possibly the trickiest, 
proof in this book, and it is certainly acceptable to skip this proof upon first reading. 
Indeed, one can have a solid understanding of introductory real analysis without 
knowing Lebesgue’s Theorem at all. However, it is hard to skip the statement of 
this theorem and feel that one has a good grasp of the nature of integrable functions. 
In contrast to differentiable functions, where we have a simple intuitive idea of 
differentiability in terms of graphs of functions not having “corners” or vertical 
tangent lines, there is no correspondingly simple picture of what makes a function 
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integrable. Lebesgue’s Theorem provides the closest thing we have to an intuitive 
understanding of integrable functions. As for the proof of Lebesgue’s Theorem, 
though skipping it is understandable, the reader is encouraged to work through it, to 
appreciate the cleverness involved, and to see precisely how the notion of measure 
zero is used in the proof. Moreover, though real analysis would be a very tedious 
subject if all the proofs were as long as that of Lebesgue’s Theorem, working through 
a long and difficult proof on occasion is a worthwhile endeavor somewhat similar 
to challenging physical exercise—getting in shape sometimes requires us to push 
ourselves to the limit of what we had thought possible. 


Exercises 


Exercise 5.8.1. [Used in Example 5.8.2.] Prove that any finite subset of IR has measure 
zero. 


Exercise 5.8.2. [Used in Example 5.8.2.] Complete the proof of Example 5.8.2 (3). 
That is, treat the case where J is not a closed bounded interval. 


Exercise 5.8.3. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f: [a,b] — R be a function. Suppose that f is integrable, that f(x) > 0 for all 
x € [a,b] and that the set {x € [a,b] | f(x) > 0} does not have measure zero. Prove 
that [? f(x) dx >0. 


Exercise 5.8.4. [Used in Section 5.8.] Use Lebesgue’s Theorem (Theorem 5.8.5) to 
give an alternative (and shorter) proof of Theorem 5.5.1 (1). 


Exercise 5.8.5. [Used in Section 5.8.] Use Lebesgue’s Theorem (Theorem 5.8.5) to 
give an alternative (and shorter) proof of Theorem 5.5.4 (2). 


Exercise 5.8.6. Use Lebesgue’s Theorem (Theorem 5.8.5) to give an alternative (and 
shorter) proof of the first part of Theorem 5.5.5, which is the fact that if f is integrable, 
then |f| is integrable. (Do not try to prove the inequality in Theorem 5.5.5 using 
Lebesgue’s Theorem.) 


Exercise 5.8.7. [Used in Section 5.6.] Find an example of a function with domain a 
non-degenerate closed bounded interval that is integrable and has an antiderivative, 
but that is not continuous. It is acceptable if the lack of continuity is asserted without 
proof, but a proof must be provided for the other two properties. One way to construct 
an example is to modify Example 4.2.5 (1). 


Exercise 5.8.8. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. Suppose that f is monotone. The purpose of this exercise 
is to prove that the set of numbers at which f is discontinuous is countable. It will 
then follow from Example 5.8.2 (4) and Lebesgue’s Theorem (Theorem 5.8.5) that 
a monotone function is integrable. (A previous proof that a monotone function is 
integrable was given in Exercise 5.4.12, using Theorem 5.4.7 rather than Lebesgue’s 
Theorem; the present proof, though a bit longer, yields more information, because we 
learn about the set of numbers at which a monotone function is discontinuous.) 
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Suppose that f is increasing; the other case is similar, and we omit the details. If 
f(a) = f(b), then the function is constant, and hence continuous. Now suppose that 
f(a) < f(d). 


For convenience, if c € [a,b] we will write f_(c) to denote lim f(x), and f+(c) to 
x= 
denote lim f (x), where we replace lim f(x) with f(a) when c = a, and we replace 
x—->Cc x—c 


lim, f(x) with f(b) when c = b. 
«“—>Cc 


(1) Letc € [a,b]. Prove that f_(c) exists and f_(c) < f(c); prove that f4(c) exists 
and f(c) < fi(c). [Use Exercise 4.5.10 (1).] 

(2) Let c € [a,b]. Prove that f is discontinuous at c if and only if f_(c) < f;(c). 

(3) Let c,d € [a,b]. Suppose that c < d. Prove that f,(c) < f_(d). 

(4) Let E = {r € [a,b] | f is discontinuous at r}. Let h: E — Q be defined as 
follows. Let r € E. By Part (2) of this exercise we know that f_(c) < f;(c). It 
follows from Theorem 2.6.13 (1) that there is some g € Q such that f_(c) < 
q < f+(c). We then let h(r) = q. There will always be more than one possible 
value of g, but we choose one such value arbitrarily. Prove that h is injective. 
Because Q is countable, it follows from standard facts about countable sets 
that E is also countable; see [Blol0, Sections 6.5 and 6.6] for information 
about countable sets. 
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The geometric motivation for integration is the need to find the area of curved regions 
of the plane. And yet, in the discussion of area via integration in calculus courses, a 
substantial aspect of the study of area is always skipped over. In calculus courses it is 
taken for granted that the integral of a non-negative function equals the area under 
the graph of the function, but a rigorous study of area requires a proof of this fact. Of 
course, such a proof requires that we start with a rigorous definition of the concept of 
area of subsets of the plane, and such a definition is precisely what is glossed over in 
calculus courses—because it involves technicalities not available in such courses, but 
which are available to us. 

In Section 4.5 we saw the definition of some geometric properties of graphs of 
functions, for example increasing and decreasing. These properties were defined with- 
out reference to calculus, though it was then proved that for differentiable functions, it 
is possible to use derivatives to provide an easier way to verify whether or not a func- 
tion satisfies these properties. We now have an analogous situation involving integrals. 
More specifically, we discuss the concepts of area and arc length, both of which will 
be defined without reference to calculus, and both of which can be computed much 
more easily using calculus in the case of integrable functions. In contrast to concepts 
such as increasing and decreasing, which are quite easy to define geometrically, the 
geometric definitions of area and arc length are rather tricky, making use of least 
upper bounds and greatest lower bounds. 
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We start with a discussion of area. Suppose that A C R? is a set. We would like 
to associate to the set A a number called the area of A. It will turn out that it is not 
possible to do so for every set A, as will be seen in Example 5.9.7 (2). The basic idea 
for finding the area of the set A is to try to approximate A with polygons, the areas of 
which are easy to compute, and then take some sort of limit. However, it would be 
possible to do such an approximation in one of two ways, using either polygons that 
are contained in A, or polygons that contain A; in the former case the “limit” would 
be computed via a least upper bound, and in the latter case via a greatest lower bound. 
To make sure that there are polygons that contain the set A, we restrict our attention 
to bounded subsets of R2, as will be defined below. If the set A is to have something 
that we would want to call area, then it ought to be the case that the same result is 
obtained using polygons contained in A and polygons that contain A, and we will 
define the area of A only when we have such equality. To make things as simple as 
possible technically, and in order to avoid having to define polygons in general, we 
restrict our attention to those polygons that are made up out of rectangles that have 
edges that are parallel to the x-axis and the y-axis. 


Definition 5.9.1. Let S C R?. 


1. The set S is bounded if there are closed bounded intervals [a,b], [c,d] CR 
such that S C [a,b] x [c,d]. 

2. The set Sis a rectangle if S = [a,b] x [c,d] for some closed bounded intervals 
[a,b], [c,d] CR. 

3. Suppose that S is a rectangle. Then S = [a,b] x [c,d] for some closed bounded 
intervals [a,b],[c,d] C R. The interior of S is the set (a,b) x (c,d). The 
rectangle S is non-degenerate if [a,b] and [c,d] are both non-degenerate 
intervals. A 


Our use of the term “rectangle” here is restricted, for convenience, to those 
rectangles in the plane that have edges that are parallel to the x-axis and y-axis; we 
will not use any other type of rectangle. 

Observe that the closed bounded intervals in the definition of rectangles are 
allowed to be degenerate, meaning single points, which therefore means that rectangles 
are themselves allowed to be degenerate, meaning vertical line segments of the form 
(a, a] x [c,d], horizontal line segments of the form [a,b] x [c,c] and single points of 
the form [a,a] x [c,c]. Degenerate rectangles have empty interiors. Although line 
segments and points are not normally considered rectangles, viewing them as such 
allows us to avoid some special cases. 


Definition 5.9.2. A special polygon is a collection of finitely many rectangles in 
R? such that the interiors of any two of the rectangles are disjoint. If S is a special 
polygon, the underlying space of S, denoted U(S), is the union of the rectangles in 
S. A 


A rectangle in R? can be thought of as a special polygon that has one element. 
See Figure 5.9.1 for an example of a special polygon that has four rectangles, one 
of which is degenerate. Observe that the rectangles that make up a special polygon 
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are not required to touch each other, so that a special polygon need not be “connected” 
(a term we have not defined, and will not be using rigorously). 


ott 


Fig. 5.9.1. 


We now turn to the areas of rectangles and special polygons, which are very 
simple to define, but which are the basis for finding the areas of more complicated 
subsets of R?. 


Definition 5.9.3. Let S C R?. 


1. Suppose that S is a rectangle. Then S = [a,b] x [c,d] for some closed bounded 
intervals [a,b], [c,d] C R. The area of S, denoted A(S), is defined by A(S) = 
(b—a)(d—c). 

2. Suppose that S is a special polygon. Then S is a collection of finitely many 
rectangles in R? such that the interiors of any two of the rectangles are disjoint. 
The area of S, denoted A(S), is the sum of the areas of the rectangles in S. A 


In order to avoid being distracted by some elementary, but tedious, proofs involv- 
ing rectangles in the plane, we will state without proof some properties of special 
polygons that we will need. Suppose that P,Q C R? are special polygons. Properly 
speaking, if we write PU Q, that does not define a special polygon, because some of 
the rectangles in P and Q might have interiors that are not disjoint. However, it can 
be shown (by dividing up the rectangles in each of P and Q) that there is a special 
polygon W C R? such that U(W) = U(P) UU(Q). In order to avoid cumbersome 
notation, we will abuse notation and write “PU @Q” to mean the special polygon W. 
Hence U(PUQ) = U(P) UU(Q). A similar idea holds for “PQ” and “P — Q.” Also, 
if x € U(Q) —U(P), it can be shown that there is a non-degenerate rectangle R C R* 
such that x € RC U(Q) —U(P). Additionally, it can be shown that the expected area 
formulas for special polygons hold, for example A(PU Q) = A(P) +A(Q) —A(PNQ), 
and if U(P) C U(Q), then A(P) < A(Q) and A(P — Q) = A(P) —A(Q). 

The basic idea for proving the above facts about special polygons is that if the 
rectangles in a special polygon are broken up into subrectangles by intersecting 
the original rectangles with a horizontal or vertical line, then the special polygon 
consisting of the smaller rectangles will have the same underlying space and the 
same area as the original special polygon. Then, when we wish to take the union, 
intersection or set difference of two special polygons, we first break up the two special 
polygons using the lines containing the edges of all the rectangles in the two special 
polygons, and then the intersection of the underlying spaces of the original special 
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polygons consists of a (possibly empty) union of rectangles that are in both of the 
new special polygons. 

The reader might be concerned that the use of such geometric facts about rectan- 
gles in the plane is outside the framework of our axioms for the real numbers, but 
in fact even here the proofs ultimately rely upon our axioms. There are two main 
ingredients needed to make such a proof rigorous. The first is that the intersection 
of two rectangles is also a rectangle (recall that all of our rectangles have edges 
that are parallel to the x-axis and y-axis). Although this fact seems intuitively clear 
geometrically, it is really not a geometric statement at all, but a general fact about 
intersections of products of sets; see [Blol10, Theorem 3.3.12] for this fact about sets. 

The second ingredient is more specific to the real numbers. In order to verify 
the above statements about breaking up the rectangles in a special polygon, we 
would need to proceed one step at a time, in which we take a single rectangle, 
and break it up into two rectangles by intersecting it with a horizontal or vertical 
line, and verifying that the two new rectangles have the same combined area as 
the original rectangle. For example, suppose that S = [a,b] x [c,d] for some closed 
bounded intervals [a,b], [c,d] C IR, and we wish to break up S into two rectangles 
using the vertical line x = p. If [a,b] is degenerate, or if p is not in the interior 
of [a,b], then there is nothing to do, so suppose that [a,b] is non-degenerate and 
that p € (a,b). We break up the rectangle S into two rectangles, and form a new 
special polygon T = {[a, p] x [c,d], [p,b] x [c,d] }. We then need the Trichotomy Law 
to prove that [a, p] U[p,b] = [a,b], which in turn is needed to prove that U(T) = 
((a,p] x [e,d]) U([p,b] x [e,d]) = ((a,p]Ulp,b]) x [c,d] = [a,b] x [c,d] =U(S), and 
we need the Distributive Law and the Associative, Commutative, Identity and Inverses 
Laws for Addition to prove that A(7) = (p—a)(d—c)+(b— p)(d—c) =[(p—a)+ 
(b— p)|(d —c) = (b—a)(d—c) =A(S). What appears to be a geometric fact about 
rectangles in the plane is really an application of the properties of the real numbers. 

Assuming the above facts about special polygons, we now find the area of a more 
general subset of R* by looking at the least upper bound of the areas of the special 
polygons that have underlying spaces contained in the set, and the greatest lower 
bound of the areas of the special polygons that have underlying spaces containing the 
set. The following definition and lemma are the first step in this process. 


Definition 5.9.4. Let S C R? be a non-empty set. Suppose that S is bounded. Let 
Is = {A(P) | P is a special polygon in R? such that U(P) C S} 


and 


Os = {A(Q) | Q is a special polygon in R? such that S C U(Q)}, 


The inner content of S, denoted /C(S), is defined by /C(S) = lub Js, and the outer 
content of S, denoted OC(S), is defined by OC(S) = glb Os. A 


The first part of the following lemma shows that Definition 5.9.4 makes sense. 
Lemma 5.9.5. Let S C R* be a non-empty set. Suppose that S is bounded. 


1. Is has a least upper bound, and Os has a greatest lower bound. 


5.9 Area and Arc Length 297 


2. IC(S) < OC(S). 
3. IC(S) = OC(S) if and only if for each € > 0, there are special polygons P and 
Q in R? such that U(P) CS CU(Q) and A(Q) —A(P) <e. 


Proof. We prove the three parts of the lemma together. Because S # 0, there is some 
(x,y) € S. Hence the special polygon V = {[x,x] x [y,y]} has the property U(V) C S, 
and therefore A((x,x] x [y,y]) = 0 is in Is. Because S is bounded, there is some 
rectangle [a,b] x [c,d] C R? such that S C [a,b] x [c,d]. Hence the special polygon 
W = {[a,b] x [c,d]} has the property S C U(W), and therefore A([a,b] x [c,d]) = 
(b—a)(d—c) is in Os. Therefore Js and Os are non-empty. Let P,Q C R* be special 
polygons such that U(P) C S C U(Q). Then A(P) < A(Q). Hence, if a € Is and 
b € Os, then a < b. The three parts of the lemma now follow immediately from the 
No Gap Lemma (Lemma 2.6.6). 


We are now ready to give the definition of the area of subsets of R*, when such 
areas exist. 


Definition 5.9.6. Let S C R? be a non-empty set. The set S is squarable if S is 
bounded and JC(S) = OC(S). If S is squarable, the area of S, denoted A(S), is defined 
by A(S) = IC(S) = OC(S). A 


Observe that the term “squarable” is analogous to “differentiable” and “integrable,” 
whereas the term “area” is analogous to “derivative” and “integral.” (There would 
be a closer analogy if we used the term “‘area-able” rather than “‘squarable,” but the 
former term does not, fortunately, appear in the literature, whereas “squarable” does.) 
The concept of squarability is the 2-dimensional case of the more general concept 
of Jordan measurable, which is defined for subsets of R” for all n € N. Although the 
concept of squarability is the most elementary way to define area of subsets of R*, and 
hence we are using it here, the more general concept of Jordan measure is not widely 
studied today, because it has been superseded by the more powerful, though slightly 
more difficult to define, concept of Lebesgue measure. This latter type of measure is 
used as the basis for the definition of the Lebesgue integral, which is widely used in 
more advanced treatments of real analysis. See [Str00, Chapter 14] for an exposition 
of Lebesgue measure and integration. 

It is hard to evaluate the squarability of most subsets of R? directly from the 
definition, and hence the following examples, where we can make such an evaluation, 
might seem to be either uninteresting or a lot of work for nothing, but it is the best we 
can do. 


Example 5.9.7. 


(1) Let S C R? be the triangle with vertices (0,0), (1,0) and (0, 1). The reader is 
familiar with the formula for the area of a triangle, and according to that formula the 
area of this triangle is 5. However, we have not given a proof of that area formula 
using the definition of area in Definition 5.9.6; we need to show that S' is squarable 
using only what we have stated so far regarding squarability. The reader might then 
suggest the following argument, which is really just the proof of the area formula for 


triangles in our particular case: we know that the area of the unit square is 1, which 
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holds because a square is a rectangle, and we know that the area of the triangle S is 
half the area of the unit square, and therefore the area of S must be 5. However, such 
reasoning requires a number of assumptions that, while ultimately correct, would 
need to be proved, including the fact that the triangle is squarable (not every subset of 
a squarable set is squarable, as will be seen in Part (2) of this example); the fact that 
congruent subsets of the plane have equal areas; and the fact that the area of the union 
of two subsets of the plane that do not have any overlap of their interiors is the sum 
of the areas of the two parts. Proving all of those facts would take more effort than 
showing directly in the present case that S is squarable. 

For each € > 0, we will construct special polygons P,Q C R? such that U(P) C 
S CU(Q), and that 5 -€ < A(P) < } < A(Q) < 5 +€. It will then follow from 
Exercise 2.6.8, which is a variant of the No Gap Lemma (Lemma 2.6.6), that 

IC(S) = OC(S) = * 

and we will then deduce that S is squarable and that A(S) = 5. 

Let € > 0. There are many ways to construct the desired special polygons P and 
Q, and we will show one such construction. By Corollary 2.6.8 (2) there is some 
n € N such that 7 < €. In Figure 5.9.2 we see a way of constructing P and Q (the 
figure shows the case n = 4). In general, the special polygons P and Q each have 
n rectangles (one of the rectangles in P is degenerate, and so it is not visible in the 


figure), where all the rectangles have height ‘, and where the rectangles in P have 
1 2 
-,=,...,1 
n 3 n ? ? ? 
respectively. By using the well-known formula for the sum of the first n integers, 
found in Exercise 2.5.5, it is seen that A(P) = 4 — x, and A(Q) = 5+ x Then clearly 


Te <A(P) <5 <A(Q)<}t+e. 


widths 0,1 nl respectively, and the rectangles in Q have widths 


ane? * > pn? 


Fig. 5.9.2. 


(2) Let T = ([0,1] x [0,1]) A (Q x Q) C R*. We will show that T is not squarable. 
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Let [a,b] x [c,d] C R* be a rectangle. If both [a,b] and [c,d] are non-degenerate, 
then each contains an irrational number by Theorem 2.6.13 (2), and it follows that 
[a,b] x [c,d] could not be a subset of 7. Hence, if P C R? is a special polygon such 
that U(P) CT, then P is the union of finitely many degenerate rectangles, and hence 
A(P) = 0. It follows that JC(T) = 0. 

Let Q C R? be a special polygon such that T C U(Q). Suppose that [0,1] x 
[0,1] Z Q. Then there is some x € ([0, 1] x [0,1]) — Q. The rectangle [0,1] x [0, 1] 
is a special polygon, and then using a remark made earlier in this section there is 
a non-degenerate rectangle R C R? such that x € R C U((0, 1] x [0,1]) —U(T). By 
Theorem 2.6.13 (1) there exists a point with rational coordinates in R, which is a 
contradiction to the fact that T C U(Q). Hence [0, 1] x [0, 1] C Q. It now follows that 
1 =A([0, 1] x [0,1]) < A(Q). Hence 1 is a least element of Or, and it follows from 
Exercise 2.6.2 (3) that OC(T) = glb Or = 1. 

Because [C(T) 4 OC(T), then T is not squarable. o) 


Before proceeding, we note that there is one slight problem with the definition 
of the area of squarable sets that needs to be clarified. Let S C R? be a rectangle. 
Then the area of S was defined to be A(S) = (b —a)(d —c) in Definition 5.9.3 (1). 
On the other hand, we can think of S simply as a subset of R2, and as such we can 
ask whether it is squarable, and if it is, whether its area using Definition 5.9.6 equals 
(b—a)(d—c). If these two approaches to the area of a rectangle do not yield the same 
result, then our definition of area would be very questionable. Fortunately, as seen in 
Exercise 5.9.1, everything works out as one would hope. 

Although it is hard to evaluate the squarability of most subsets of R? directly 
from the definition, in the special case of the region under the graph of a non-negative 
integrable function, we can use integration to compute area, as seen in Theorem 5.9.9 
below. We start with the following definition. 


Definition 5.9.8. Let f,g: [a,b] — R be functions. Suppose that f(x) < g(x) for all 
x € [a,b]. The region between the graphs of f and g, denoted R°(f,g), is defined by 
Ra(f.8) = {Q,y) €R’ |a<x< band f(x) Sy < g(x)}- 


If f(x) > 0 for all x € [a,b], the region under the graph of f, denoted R°(f), is 
defined by 
Ra(f) = {(4y) €R’ |a Sx < band0<y< f(a)}. A 


For the proof of the following theorem, recall the concept of upper integral and 
lower integral defined in Definition 5.4.8. 


Theorem 5.9.9. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is bounded, and that f(x) > 0 for all 
x € [a,b]. Then R°(f) is squarable if and only if f is integrable, and if f is integrable 


then A(R°(f)) = [? F(x) de. 
Proof. We will show that IC(R?(f)) = f? f(x) dx. A similar argument shows that 
OC(R®(f)) = ia f(x) dx, and we omit the details. It will then follow that R?(f) is 
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squarable if and only if ti f(x)dx= ¢ f (x) dx, and by Theorem 5.4.10 we know that 
the latter condition holds if and only if f is integrable, and that if f is integrable, then 
[i @de= Li Gide = [) fajde ICRI) = OC (REF) = AR). 
Let 
£={L(f,P) | P isa partition of [a,b] }. 


By definition we know that /C(R?(f)) = lub Trop) and Ag f(x)dx = lub £. We will 
prove that (1) £C TRb(p)s and that (2) for each x e Trb(f)> there is some y € £ such 
that x < y. It will then follow from (1) together with Exercise 2.6.1 (1) that lub £ < 
lub Trb( fy» and it will follow from (2) together with Exercise 2.6.3 (1) that lub Trb(f) < 


lub £, which together imply that lub Zgo(-) = lub.£, which means that JC (Ro(f)) = 
i f(x) dx 

Let V = {xo,x,,...,x,} be a partition of [a,b]. Let V be the special polygon 
defined by 


V = {bei-1,21] x [0,m?(f)] } 51 - (5.9.1) 


It is evident that A(V) = L(f,V). It follows that £ C Trop), and so (1) has been 
proved. 

Let x € Ip» ¢). Then x = A(P) for some special polygon P C R? such that U(P) C 
R>(f). Then P = {[a;,bj] x (ci, d;] }7_,, where {a;, bj] x [ci,dj] is a rectangle for each 
i€ {1,...,n}, and where the interiors of any two of these rectangles are disjoint. See 
Figure 5.9.3 (i). 

Letk € {1,...,n}. Because [ag, by] x [cx, dy] C R2(f), it follows that 0 < cy < dk < 
f (x) for all x € (az, by]. We deduce that (ax, by] x (cx, dk] ie (ax, by] x (0, d,] Cc R®(f). 


Hence 
n 


U(P) =|) [ai,bil SUI aj,b 
i=l i=l 

See Figure 5.9.3 (ii). It might be the case that the rectangles of the form [a;,;] x [0, dj] 

overlap in their interiors. However, by subdividing these rectangles vertically if 

necessary, we can find new rectangles {[pj,qj] x [0,5;]}”" j-v for some m €N, such 

that 


n m 
U(P) © UJ [ai,bi x [0,4] S LU [p;,43] x [0,5)] C RAC), 
i=1 j=l 
and that the interiors of any two of these new rectangles are disjoint. See Figure 5.9.3 
(iii). By renumbering the rectangles in {[p;,q;] x [0,5 4 ” if necessary, we may 
suppose thata < pi < qi < po <qo<-++< Pm <n < o ee the numbers 
a,P15,91;-++;Pm;9m,0 might not all be distinct, we let yo, y1,...,y, be the same list 
of numbers, in the same order, but with each number listed only once. Then Q = 
{yo,1,---,¥v} is the partition of [a,b]. For each r € {1,...,v}, we form the rectangle 
[y-1,r] x [0,t,], where 


eT if [y--1,y7] = [p;,@;] for some j € {1,...,m} 
° 0, otherwise. 
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(ii) 


(iii) 


Fig. 5.9.3. 


By definition, the collection of rectangles {|y,—1,y,] x [0,t-]}’_, consists of all rect- 
angles of the form [p;,qj] x [0,5;] for j € {1,...,m}, with possibly some additional 
degenerate rectangles located in the x-axis. It follows that 


m 


U(P) © U p.43] x [0,5)] C cUl Dr—1,yr] x [0,/] © Ra(f). 
j=l r=1 
Let r € {1,...,v}. Then [y,1,y,] x [0,t,] C R2(f). It follows that t, < f(x) for 


all x € [y,—1,y,]. We deduce that t, < m2(f). Let O be the special polygon defined 
analogously to Equation 5.9.1. Then 


UP) U bry  [0,t,] CU(B) CRELP). 


We deduce that x = A(P) < A(Q) = L(f,Q). If we let y = L(f,Q), then y € £. We 
have therefore shown that there is some y € £ such that x < y, and so (2) has been 
proved. 


It is important to stress that Theorem 5.9.9 applies only to non-negative integrable 
functions. For arbitrary integrable functions, it is not the case that integration yields 
the area between the graph of the function and the x-axis. Rather, integration gives 
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“signed area,” in the following sense. If a function is non-positive, it follows from 
Exercise 5.9.3 and Theorem 5.3.1 (3) that the integral of the function yields the 
negative of the area above the graph (meaning the area of the region between the 
graph of the function and the x-axis). For an arbitrary integrable function, if the 
domain of the function can be broken up into finitely many subintervals such that the 
function is either non-negative or non-positive on each subinterval, then by Corol- 
lary 5.5.9 the integral yields the area under the graph of the non-negative parts minus 
the area above the non-positive parts. 

In order to make use of Theorem 5.9.9 to find the area of the region between the 
graphs of two functions, we first need the following more general fact about areas 
of squarable subsets of the plane. This result might seem obvious intuitively, but the 
proof requires a bit more effort than might at first be assumed. 


Theorem 5.9.10. Let S,T CR? be non-empty sets. Suppose that S and T are bounded, 
and that OC(SNT) = 0. If any two of S, T and SUT are squarable then so is the 
third, and if they are squarable then A(S) + A(T) =A(SUT). 


Proof. Suppose that S and T are squarable. The other two cases are similar, and we 
omit the details. 

Using the notation of Exercise 2.6.9, we will prove that lub( Js + I) = lub Isur and 
glb(Os + Or) = lub Ogur. It will then follow from Exercise 2.6.9 (3) (4) that lub Is + 
lub Ip = lub Isur and glb Os + glb Or = lub Osur, and therefore that IC(S) +IC(T) = 
IC(SUT) and OC(S) + OC(T) = OC(SUT). Because S and T are squarable, we know 
that A(S) = IC(S) = OC(S) and A(T) =IC(T) = OC(T), and it will then follow that 
IC(SUT) =IC(S) +IC(T) = OC(S) + OC(T) = OC(SUT), which means that SUT 
is squarable, and that A(SUT) = IC(SUT) = IC(S)+IC(T) =A(S)+A(T). 

Let z € Os + Or. Then z= x+y for some x € Os and y € Or. Then x = A(P) and 
y =A(Q) for some special polygons P,Q C R? such that S C U(P) and T C U(Q). 
Then SUT C U(PUQ), and A(PUQ) = A(P) +A(Q) —A(PNQ) < A(P)+A(Q). 
Hence there is some w € Osyr, specifically, the number w = A(PUQ), such that 
w <x+y =z. It then follows from Exercise 2.6.3 (2) that glb Osur < glb(Os + Or). 

Let a € Osur. Let € > 0. Then a = A(G) for some special polygon G C R? such 
that SUT C U(G). Because OC(SNT) = 0, it follows from Lemma 2.6.5 (2) that 
there is some b € Osny such that b < Then b = A(C) for some special polygon 
C CR? such that SOT CU (C). Without loss of generality, we may assume that 
U(C) CU(G), because if not, we could replace C with CNG, and this replacement 
for C would have the same properties as the original C. 

Because S is squarable, then by the definition of squarability together with 
Lemma 5.9.5 (3) there are special polygons V,W C R? such that U(V) CS C U(W) 
and A(W) —A(V) < 5. Without loss of generality, we may assume that U(W) C U(G), 
because if not, we could replace W with WMG, and this replacement for W 
would have the same properties as the original W. Observe that U(V) C U(G). Let 
B =(G—V]UC. Then A(B) < [A(G) —A(V)]+A(C). It is left to the reader to ver- 
ify that T C U(B), and that U(W MB) C [U(W) —U(V)]UU(C). Then AWB) < 
[A(W) —A(V)]+A(C) < § +5 =€. It can also be verified that U(W UB) = U(G). 
Therefore A(G) = A(W) +A(B) —A(WMB) > A(W) + A(B) — €, which implies that 
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A(W)+A(B) < A(G)+€. Let r= A(W) and s = A(B). Thenr+s <a+€. Because 
r+s € Os+ Or, it follows from Exercise 2.6.3 (2) that glb(Os + Or) < glb Osur. We 
deduce that glb(Os + Or) = glb Osur. 

Let e € Is + Ir. Then e = f +g for some f € Is and g € Ir. Then f = A(H) and 
g =A(K) for some special polygons H,K C R? such that U(H) CS and U(K) CT. 
Then U(H UK) CSUT, and A(HUK) =A(H)+A(K) —A(HNK), and U(HNK) C 
SAT. Because OC(SNT) = 0, then Lemma 5.9.5 (2) implies that JC(SNT) = 0, and 
it follows that A(H MK) = 0. Therefore A(H UK) = A(H) + A(K). Hence there is 
some m € Isur, specifically, the number m = A(H UK), such that m = f +g =e. It 
then follows from Exercise 2.6.3 (1) that lub Isyr > lub(Js + Ir). 

Let k € Isur. Let € > 0. Then k = A(L) for some special polygon L C R? such 
that U(L) C SUT. Let b and C be as above. Then U(L—C) C (S—T)U(T —S). 
Let M be the collection of those rectangles in L —C that are contained in S$ —T, 
and let N be the collection of those rectangles in L —C that are contained in T — S. 
We see that L—C = MUN, and MNN = 9, and U(M) CS and U(N) CT. Then 
A(M) +A(N) =A(MNN) > A(L) —A(C) > A(L) —€. Let v = A(M) and w = A(N). 
Then v-+w >k—€. Because v+w € Is + Ir, it follows from Exercise 2.6.3 (1) that 
lub(Js + Ir) > lub Isur. We deduce that lub( Js + Ir) = lub Isur. 


The following theorem is proved using Theorem 5.9.9 and Theorem 5.9.10. 


Theorem 5.9.11. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f,g: [a,b] — R be functions. Suppose that f(x) < g(x) for all x € [a,b], and that 
f and g are integrable. Then the region between the graphs of f and g is squarable 


and A(R®(f,g)) = fi Lf —g](x)dx. 


Proof. Left to the reader in Exercise 5.9.7. 


As mentioned above, we have defined area using only special polygons, and not 
arbitrary polygons. It is technically simpler to use only special polygons, but the 
reader would have good reason to ask whether we would have obtained a different 
definition of area had we used more general polygons. Fortunately, it turns out that 
computing inner content and outer content using arbitrary polygons always gives the 
same result as using only special polygons. We will not go through a proof of this fact, 
because doing so requires a rigorous definition of arbitrary polygons (which for the 
purpose of computing area must include degenerate polygons); one of the advantages 
of using special polygons is precisely that it allows us to avoid such a definition. 
The intuitive idea of such a proof, however, is seen in Figure 5.9.4. Suppose that we 
have an arbitrary polygon contained in a subset of the plane, as in Figure 5.9.4 (i). 
If the polygon has an edge that is not parallel to the x-axis or y-axis, then we can 
replace it with a “staircase” and break up the modified polygon into rectangles, as 
in Figure 5.9.4 (ii), yielding a special polygon. Although the new special polygon 
has smaller area than the original polygon, the special polygon can be chosen to 
have area within any € > 0 of the area of the original polygon. It then follows from 
Exercise 2.6.1 (1) and Exercise 2.6.3 (1) that the least upper bound of the areas of 
special polygons contained in the region of the plane equals the least upper bound of 
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the areas of all polygons contained in the region. Hence the definition of inner content 
would not change had we used arbitrary polygons instead of just special polygons. 
A similar result holds for outer content, and so we might as well stick with special 
polygons because they are easier to work with. 


(i) (ii) 


Fig. 5.9.4. 


We now turn to the arc length of graphs of functions. It is also possible to find the 
arc length of more general curves in the plane, but doing so requires using functions 
R — R? rather than functions R — R, and hence is outside the scope of this book. 

As was the case in our discussion of area, we start with a geometric definition of 
arc length, which does not involve differentiation and integration, and only after that 
do we show that for nicely behaved functions we can use integration to compute arc 
length. The intuitive idea is that we calculate the arc length of a graph of a function by 
approximating the graph by polygonal arcs made up of straight line segments between 
points on the graph, and then taking some sort of limit. As was the case with our 
definition of area, the “limit” will be computed by finding a least upper bound. In 
contrast to the definition of area, however, where we approximated a subset of the 
plane from both the inside and the outside, and hence we needed both least upper 
bounds and greatest lower bounds, for the arc length of the graph of a function, there 
is no “inside” or “outside,” and we will need only least upper bounds, because the 
length of a polygonal approximation of the graph of a function is always less than or 
equal to the length of the graph itself. 

Just as we based our discussion of the area of arbitrary subsets of the plane on the 
formula for the area of a rectangle, and we then added up such areas to approximate the 
area of a more complicated set, for arc length we base our discussion on the formula 
for the length of a line segment, and we then add up such lengths to approximate the 
arc length of a more complicated curve. We start with the following definition, which 
uses partitions of closed bounded intervals to determine polygonal approximations 
to graphs of functions, and the Pythagorean Theorem to compute lengths of line 
segments. See Figure 5.9.5 (i) for the graph of a function, and Part (ii) of that figure 
for a polygonal arc that approximates the graph. 


Definition 5.9.12. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let P = {x0,x1,...,X,} be a partition of [a,b]. The 
polygonal sum of f with respect to P, denoted C(f,P), is defined by 
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CFP) = Y bia? + Uf) — Fa). A 
i=1 
y y 
x x 
Xo X, Xo. X3 X4 Xo Xp X2 X3 x4 
(i) (ii) 
Fig. 5.9.5. 


Example 5.9.13. Let f: [0,1] — R be defined by f(x) = x? forall. x € (0, 1], and let 
P= {0,4,5,3,1}. Then 


= 1.436. 


The actual length of the curve will be computed in Example 5.9.18. o) 


The following lemma about polygonal sums is quite reasonable intuitively if we 
think about lengths of edges of triangles, though the proof, which uses some ideas 
from linear algebra, is a bit more complicated than might be expected. These ideas 
are sketched out in an exercise, with no assumption that the reader is familiar with 
linear algebra. 


Lemma 5.9.14. Let [a,b] C R be a non-degenerate closed bounded interval, let 
f: [a,b] — R be a function and let P and Q be partitions of {a,b}. If Q is a refinement 
of P, then C(f,Q) > C(f,P). 


Proof. Left to the reader in Exercise 5.9.9. 


We are now ready to give the definition of the arc length of graphs of function, 
when such arc lengths exist. 


Definition 5.9.15. Let [a,b] C R be a non-degenerate closed bounded interval, and 
let f: [a,b] — R be a function. Let 
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Ay = {C(f,P) | P is a partition of [a,b}}. 
The function f is rectifiable if 7 is bounded above. If f is rectifiable, the arc length 
of f, denoted L?(f), is defined by L?(f) = lub ¢. is 


Observe that the term “rectifiable” is analogous to “squarable,” and the term “arc 
length” is analogous to “area.” 


Example 5.9.16. 


(1) Let f: [0,2] — R be defined by f(x) = 3x for all x € [0,2]. We will show that 
f is rectifiable and find its arc length. Let P = {xo,x,...,X,} be a partition of [0,2]. 
Then 


n 


CUP) =¥ yi — al? + (fa) — F-P = Yi — P+ Bx — 3? 
i=] 


i=1 


= ¥ V0: =¥-1) = V0 20) = 2V00. 
i=1 


Hence, all polygonal sums of f are equal to 2\/10, and therefore Af = {2/10}. 
Clearly Ay is bounded above, and hence f is rectifiable. Moreover, we see that 
i (f) = lub 47 = 2/10. Given that the graph of f is a straight line, a quick calculation 
with the Pythagorean Theorem shows that the arc length that we computed with 
polygonal sums is the expected answer. 

(2) Let r: [0,1] — R be defined by 

f if x € QN [0,1] 

r(x) = 
0, otherwise. 
It was shown in Example 3.3.3 (6) that r is discontinuous everywhere, and in Ex- 
ample 5.2.6 (3) that r is not integrable. We will now show that this function is not 
rectifiable. Let M € R. We will show that there is a partition Q of [0,1] such that 
C(r,Q) > M. It will follow that .4, is not bounded above, which means that r is not 
rectifiable. 

By Corollary 2.6.8 (1) there is some p € Z such that M < p. By taking a larger 
value of p if necessary, we may assume that p > 0 and that p is even. Then p = 2n 
for some n € N. Let Q = {x0,%1,...,X2n} be a partition of [0,1] defined as fol- 
lows. For each even number i € {0,1,...,2n}, let x; = x. For each odd number 


i€ {0,1,...,2n}, let x; be an irrational number such that x;_; <x; <xj41, where such 
an x; can be found by Theorem 2.6.13 (2). Then 


2n 
C(,.0) = ¥ yb — xt? +f) FOP 
=!) 


~ y V bei — x24-2]? a [f (xai-1) — f (x2;-2)|? 
i=] 
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i=1 


l| 
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Il 
far 


bait topo (0 = 1)? + x [bai | Oe 
i=l 


V 
M= 
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Me 


1=2n=p>M. 
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ll 
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Whereas the discontinuous function r is not rectifiable, we note that some discon- 
tinuous functions are rectifiable, as the reader is asked to prove in Exercise 5.9.10. 

(3) In Part (2) of this example we saw a function that is discontinuous and not 
rectifiable. The reader might wonder whether there is an example of a continuous 
function that is not rectifiable. There are such functions, though they are not as simple 
to describe as the function in Part (2) of this example. The intuitive idea is that we need 
a function defined on a non-degenerate closed bounded interval that is continuous but 
has a graph that is infinitely wiggly, which makes it have infinite length. An example 
of such a function is seen in Exercise 10.5.3; this function is a special case of the 
continuous but nowhere differentiable functions discussed in Section 10.5. © 


It is hard to evaluate the rectifiability of most functions directly from the definition. 
See [Tri95, Chapter 7] for a thorough geometric discussion of rectifiable curves, and 
elsewhere in that book for the relation of non-rectifiable curves to fractal curves. 
Fortunately, if a function is continuously differentiable, the situation is much easier, 
as we see in the following theorem. We note that in contrast to Theorem 5.9.9, the 
following theorem is not an “if and only if” result, because it is possible for the graph 
of a non-differentiable function to be rectifiable, for example the graph of the absolute 
value function. 


Theorem 5.9.17. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. Suppose that f is continuous on |a,b| and continuously 
differentiable on (a,b), and that f' is bounded on (a,b). Then f is rectifiable and 


Lin = [i+ TreoPae 


Proof. Let g: [a,b] > R be defined by 


7 1+ [f'(x)|?,  ifx € (a,b) 
g(x) = ff otherwise. 


Because f’ is bounded on (a,b), there is some M € R such that | f’(x)| < M for all 
x € (a,b); we may assume that M > 0. Then f’((a,b)) C [—M,M]. Leth: [-M,M] > 
R be defined by h(x) = V1+.? for all x € [—M,M]. It is left to the reader to verify 
that / is continuous on [—M,,M] and differentiable on (—M,M), and that |h’(x)| <1 
for all x € (—M,M); in addition to using facts we have already seen, the reader will 
need to use Theorem 7.2.13 (1). It then follows from Exercise 4.4.6 that h is uniformly 
continuous. By Theorem 3.4.5 we see that h is bounded. It follows immediately 
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that h| f((a,b)) 18 bounded, and it follows from Lemma 3.4.2 and Exercise 3.3.2 (2) 
that h| f'((a,b)) 18 continuous. By abuse of notation we can think of f’ as a function 
(a,b) > f'((a,b)). Then g|(a.») = Al p((a,py) 0 f’ We deduce immediately that |, ,) 
is bounded, and it follows from Theorem 3.3.8 (3) that 8\ (a,b) is continuous. Then 
g is bounded, and g is continuous except possibly at a and b. It now follows from 
Exercise 5.5.5 (2) that g is integrable. 

As a preliminary step, we show that if P is a partition of [a,b], then there is 
a representative set S of P such that C(f,P) = S(g,P,S). Let P = {xo,%1,...,xn} 
be a partition of [a,b]. Let i € {1,...,n}. Because f is continuous on [x;~1,x;] and 
differentiable on (x;—1,x;), the Mean Value Theorem (Theorem 4.4.4) implies that 
there is some s; € (x;~1,x;) such that 


F (ai) = F(%i-1)_ 


Xj — Xi-1 


f'(si) = 


Hence f(x;) — f (xi-1) = f’ (si) (xi —xj-1). Let S = {51,...,5,}. Then S is a represen- 
tative set of Z. We now see that 


P)=¥ Vl +f) Fee? 
| axes 
i=1 
=> 1+ PGi 2-1) = 516,28). 
i=1 


We now show that is g(x) dx is an upper bound of 4. Suppose to the contrary that 
there is some partition X of [a,b] such that C(f,X) > f? g(x) dx. Let € =C(f,X) — 


f? g(x) dx. Then ¢ > 0. Because g is integrable, there is some 6 > 0 such that if 
Q is a partition of [a,b] with ||Q|| < 6, and if T is a representative set of Q, then 


s(,0, T) — [? g(x) dx| < €. Let Y bea partition of [a,b] with ||Y || < 6, which exists 
by Exercise 5.2.1. Let Z= X UY. By Lemma 5.4.3 we know that Z is a refinement 
of both X and Y, and that ||Z|| < ||Y|| < 6. It follows from Lemma 5.9.14 that 


C(f,Z) >C(f,X) > f? g(x) dx. Hence 


ler.z)— f eosax| = 


By the preliminary step there is a representative set H of Z such that C(f,Z) = 
S(g,Z,H). Hence IS(g,Z,H) — f? g(x) dx| > €. On the other hand, because ||Z|| < 6, 


C(f,Z) — [a x)dx > C(f,X )- [eax 


we know that Is (g,Z,H) — f? g(x) dx| < €, which is a contradiction. We conclude 


that ¢ g(x) dx is an upper bound of 4, and it follows that f is rectifiable. 


We now show that f? g(x) dx = lub 47, which means that f? g(x) dx= Lf), Let 
1 > 0. Because g is integrable, there is some B > 0 such that if Q is a partition of [a, D] 
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with ||Q|| < B, and if T is a representative set of Q, then \s(g,0,7) — f? 9(x)dx| <n. 


Let W be a partition of [a,b] with ||W|| < B, which exists by Exercise 5.2.1. Again 
using the preliminary step, there is a representative set D of W such that C(f,W) = 


S(g,W,D). Then Ici.) — [? g(x) dx| <n. Because {? g(x) dx is an upper bound 


of Ap, it follows that C(f,W) < f? g(x) dx. Hence f? g(x)dx—C(f,W) <1. Exer- 
cise 2.6.6 now implies that he g(x) dx = lub 4,. 


Example 5.9.18. Let f: [0,1] — R be defined by f(x) = x? for all x € (0, 1]. In 
Example 5.9.13 we computed that the arc length of f was approximately 1.436. We 
now compute the exact arc length. As the reader knows informally from calculus, and 
as we will ae in Theorem 7.2.13 (1), the function f is continuously differentiable, 


and f’(x) = 32 for all x € [0, 1]. By Theorem 5.9.17 the arc length of f is 


LLGF hl > x} Jae [[ fit par= [sonia 


= fest fn] sam 


Although the integral in Example 5.9.18 was easy to compute, in practice it is 
difficult to compute the integral in Theorem 5.9.17 for most simple functions, though 
numerical approximations are always possible. 


Reflections 


Whereas today it seems self-evident that we measure the area of regions in the 
plane by associating a number—called the area—to each such region, this idea of 
associating a number to represent the area of a region of the plane is in fact not 
obvious. The ancient Greeks did not associate a number called area to each region, 
but rather discussed the equality of areas of different regions by proving that one 
region could be broken up and rearranged into the other; if two regions had different 
areas then the two regions could be compared by using the ratio of their areas, without 
actually finding the areas of the individual regions. For example, there is no formula 
in Euclid’s Elements for the area of a circle in terms of the radius; rather, what Euclid 
proved about the areas of circles was that for any two circles, the ratio of their areas 
equals the ratio of the squares of their diameters. To the ancient Greeks, geometric 
objects (such as circles) could be compared only with similar types of objects, though 
it was possible to compare the ratio of two objects of one type with the ratio of two 
objects of another type. 

Even though today we have the notion of area, volume, arc length and the like as 
numerical values, and indeed these notions seem very clear intuitively, it turns out 
that providing rigorous definitions of these concepts is not at all trivial if we want to 
find the area of subsets of the plane more complicated than polygons, and similarly 
for the other concepts. Historically, people used concepts such as area and volume 
long before they were rigorously defined. The definition of area stated in the present 
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section, which is the 2-dimensional version of the more general concept of Jordan 
measure, is from the 19th century. 

Jordan measure is not the most common method used today to define the measure 
of subsets of Euclidean space; the more common method, known as Lebesgue measure, 
is better behaved than Jordan measure, but it is also trickier to define, and its use 
would take us too far afield to be included in this text. 

The discussion of area in this section made use of some elementary properties of 
rectangles, including the fact that if a rectangle with edges parallel to the coordinate 
axes is broken up into finitely many subrectangles by lines parallel to the coordinate 
axes, then the sum of the areas of the smaller rectangles will equal the area of the 
larger rectangle. This fact is simple to prove for such rectangles, but it raises the 
following more general question about polygons in the plane. If a polygon is broken 
up into finitely many smaller polygons, and the smaller polygons are rearranged into 
a new polygon, then the new polygon will have the same area as the original polygon, 
but is the converse true? That is, if two polygons have the same area, can one polygon 
be obtained from the other by breaking it up into finitely many smaller polygons 
and rearranging? The answer, as intuitively expected, is yes; that result is called the 
Bolyai—Gerwien Theorem. David Hilbert, as one of his famous 23 problems from 
1900, asked whether the analogous result is true for polyhedra in R?, and, rather 
surprisingly, Max Dehn showed soon thereafter that the answer is no. That is, there 
are two polyhedra in R? that have the same volume, but one cannot be obtained from 
the other by breaking one up into finitely many smaller polyhedra and rearranging. 
See [Bol78] or [Pak, Chapters 15—17] for details in both the 2-dimensional and 
3-dimensional cases. 


Exercises 


Exercise 5.9.1. [Used in Section 5.9.] Let SC R? be a rectangle. Then S = [a,b] x [c,d] 
for some closed bounded intervals [a,b], [c,d] C R. Prove that S is squarable, and that 
its area using Definition 5.9.6 equals (b —a)(d —c). [Use Exercise 2.6.2.] 


Exercise 5.9.2. Let K = {(0,0)}U{(4,0) | n € N} C R?. Either prove that K is 
squarable and find its area, or prove that K is not squarable. 


Exercise 5.9.3. [Used in Section 5.9 and Theorem 7.4.4.] Let [a,b] C R be a non- 
degenerate closed bounded interval, and let f: [a,b] — R be a function. Suppose that 
f(x) > 0 for all x € [a,b]. Prove that the region between the graphs of —f and the 
x-axis is squarable if and only if the region under the graph of f is squarable, and if 
they are squarable then the areas of these two regions are equal. 


Exercise 5.9.4. [Used in Exercise 5.9.7 and Theorem 7.4.4.] Let [a,b] C R be a non- 
degenerate closed bounded interval, let f, g: [a,b] — R be functions and let k € R. Let 
f.&: [a,b] — R be the functions defined by f(x) = f(x) +k and (x) = g(x) +k for 
all x € [a,b]. Suppose that f(x) < g(x) for all x € [a,b]. Prove that the region between 
the graphs of f and g is squarable if and only if the region between the graphs of f 
and ¢ is squarable, and if they are squarable then A(R°(f,8)) = A(R2(f,g)). 
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Exercise 5.9.5. [Used in Theorem 7.4.4.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f,g: [a,b] — R be functions. Let f,@: [—b, —a] — R be the 
functions defined by f(x) = f(—x) and g(x) = g(—x) for all x € [—b, —a]. Suppose 
that f(x) < g(x) for all x € [a,b]. Prove that the region between the graphs of f and g 
is squarable if and only if the region between the graphs of f and @ is squarable, and 
if they are squarable then A(R—¢(f,2)) =A(R2(f,g)). 


Exercise 5.9.6. [Used in Theorem 7.4.4.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let f,g: [a,b] — R be functions and let c € R. Let the functions 
f,@: |at+c,b+c] — R be defined by f(x) = f(x—c) and g(x) = g(x—c) for all 
x € [a+c,b+c]. Suppose that f(x) < g(x) for all x € [a,b]. Prove that the region 
between the graphs of f and g is squarable if and only if the region between the graphs 
of f and Z is squarable and if they are squarable then A(R2¢(f,2)) =A(R2(f,g)). 


Exercise 5.9.7. [Used in Theorem 5.9.11.] Prove Theorem 5.9.11. When you use 
Theorem 5.9.9 and Theorem 5.9.10, make sure that all of the hypotheses of each are 
satisfied. [Use Exercise 5.9.4.] 


Exercise 5.9.8. In Theorem 5.9.11 it is assumed that the functions f and g are inte- 
grable. Is that necessary, or would it have sufficed to assume that g — f is integrable? 
In other words, is it possible to have functions f,g: [a,b] — R such that g —f is 
integrable, but the region between the graphs of f and g is not squarable? If yes, give 
an example. If not, prove why not. 


Exercise 5.9.9. [Used in Lemma 5.9.14 and Exercise 5.9.10.] Let [a,b] C R be a non- 
degenerate closed bounded interval, let f: [a,b] — R be a function and let P and Q 
be partitions of [a,b]. The purpose of this exercise is to prove that if Q is a refinement 
of P, then C(f,Q) > C(f,P). We start with some preliminaries, with the actual proof 
given in Part (5) of this exercise. 


(1) This part of the exercise is a special case of the Cauchy—Schwarz Inequality 
for vectors in an inner product space. Let (x1,x2), (1,2) € R*. Prove that 


[xiy1 +xay2]* < [(x1)? + (x2)"] - [1)? + 02)71- 


To prove the inequality, the first step is to let p: R — R be the function 
defined by 


p(t) = [(x1)? + G2)? J? + 2fery +xaya]t + [(v1)? + (92)?] 


for all t € R, and to prove that p(t) > 0 for allt € R. 
(2) This part of the exercise is a special case of the Triangle Inequality for vectors 
in an inner product space. Let (x1,x2), (v1,2) € R?. Prove that 


Vb ty? + boty) < yb)? + bel? + bil? + bk. 


(3) Let (u;,uz),(v1,v2),(w1,w2) € R?. Prove that 
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von _ uy]? + [wo — up|? 


< vin — uy]? + [v2 — uz]? + vie — vy]? + [we — u2]?. 


(4) Let x,y,z € [a,b]. Suppose that x < y < z. Prove that 


Ve—xP + F@) - FOP 
< bx? +0) —F@OP+ Ve»? +4 — FOP. 


(5) Prove that if Q is a refinement of P, then C(f,Q) > C(f,P). 
Exercise 5.9.10. [Used in Example 5.9.16.] Let g: [0,3] — R be defined by 


(x) x, ifxAl 
Kk) = 
° Hest, 


Prove that g is rectifiable, and find 13(g). It is acceptable to use geometric reasoning, 
such as the fact that the length of one edge of a triangle is always less than or equal 
to the sum of the lengths of the other two edges (this particular fact was proved in 
Exercise 5.9.9 (3)). 


Exercise 5.9.11. [Used in Theorem 7.4.3.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Prove that —f is rectifiable if 
and only if f is rectifiable, and if they are rectifiable then L?(—f) = L°(f). 


Exercise 5.9.12. [Used in Theorem 7.4.3.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let f: [a,b] + R be a function and let k € R. Let g: [a,b] — R be 
defined by g(x) = f(x) +k for all x € [a,b]. Prove that g is rectifiable if and only if f 
is rectifiable, and if they are rectifiable then L?(g) = L?(f). 


Exercise 5.9.13. [Used in Theorem 7.4.3.] Let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] — R be a function. Let g: [—b, —a] — R be defined 
by g(x) = f(—x) for all x € [—b,—a]. Prove that g is rectifiable if and only if f is 
rectifiable, and if they are rectifiable then L~¢(g) = L?(f). 


Exercise 5.9.14. [Used in Theorem 7.4.3.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let f: [a,b] > R be a function and let c € R. Leth: [a+c,b+c]— 
R be defined by h(x) = f(x —c) for all x € [a+c,b +c]. Prove that h is rectifiable if 
and only if f is rectifiable, and if they are rectifiable then L2*¢(h) = L?(f). 


5.10 Historical Remarks 


The problem that motivated the development of integration, which is the need to 
find areas and volumes of non-rectilinear shapes, is the oldest aspect of calculus, 
having solid roots in the ancient world. It is evident that in many ancient cultures, 
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activities such as art, architecture, farming, commerce, astronomy and more, would 
lead to a practical need to find areas and volumes of a variety of shapes. Areas of 
rectangles, triangles, trapezoids and even circles were discussed in the ancient world 
in a variety of cultures; finding areas of more complicated shapes, however, required 
the development of sophisticated mathematical techniques. 


Ancient World 


The most important method for finding areas and volumes of complicated regions in 
the ancient world was the method of exhaustion, developed in ancient Greece. The 
work of Antiphon the Sophist (480-411 BCE), who found the areas of circles by 
using inscribed polygons with successive doubling of the number of edges, may have 
been the inspiration for the method of exhaustion. Bryson of Heraclea (c. 450-c. 390 
BCE) took this idea one step closer to the method of exhaustion by considering both 
inscribed and circumscribed polygons for circles. A classic example of the method of 
exhaustion is the proof of Proposition 2 of Book XII of Euclid’s Elements, which is 
also about the areas of circles. The idea is similar to the approach of Antiphon the 
Sophist, in that polygons and polyhedra with ever-increasing numbers of sides are 
used to approximate the region of interest, but to make the proof complete, rather than 
taking a limit to infinity (as we would do today but was never done in ancient Greece), 
or even using infinitesimals (as was done in the early development of calculus), 
the proof is based upon a double proof by contradiction (referred to as reductio ad 
absurdum), making use of the Axiom of Exhaustion, which is Proposition | of Book X 
of the Elements. This method is attributed to Eudoxus of Cnidus (408-355 BCE). The 
term “exhaustion” is due to Grégoire de Saint-Vincent in 1647. 

The culmination of the method of exhaustion was the work of Archimedes (287— 
212 BCE), who found areas and volumes of a variety of regions, for example the area 
inside a parabola. The method of exhaustion was never formulated as a general method, 
however, and was used in an ad hoc fashion for each particular case. Archimedes’ 
use of the method of exhaustion was extremely clever and sophisticated, but it was 
also very tedious. Moreover, this method, which is a way of proving that a given 
area or volume is correct, hid any intuitive understanding of how the area or volume 
was first arrived at. When Archimedes’ The Method was found in 1906, it was a 
rare ancient Greek text that discussed how results were arrived at. In this particular 
case, Archimedes’ method of discovery was by viewing areas as made up of line 
elements. Archimedes hinted that Democritus may have used similar ideas. Hence, 
it might have been the case that some ancient Greeks thought intuitively in terms of 
infinitesimals, though then wrote up their proofs in the standard ancient Greek style. 
However, whereas Archimedes’ methods have some resemblance to certain aspects of 
calculus, for example the use of upper sums and lower sums, it is not correct to say 
that Archimedes was essentially doing calculus—he did not have the general idea of 
derivatives or integrals, nor the relation between tangent problems and area problems, 
and he did not provide broadly applicable computational tools. 
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Medieval Period 


The Arab mathematicians in the Middle Ages understood the method of exhaustion as 
described in Euclid’s Elements, though European mathematicians at the time did not. 
In particular, Abu Ali al-Hasan ibn al-Haytham (965-1039), also known as Alhazen, 
calculated volumes of some solids of revolution. Archimedes had looked at rotating 
a parabolic segment about its axis, but Alhazen rotated it about some other lines as 
well. 

Nicole Oresme (1323-1382) had the idea of studying variation via representation 
by coordinates, a precursor to the subsequent invention of analytical geometry. He 
appears to have understood, in special cases, that if the velocity of an object is 
graphed, then the area under the graph represents the distance traveled, which is the 
essential idea of the Fundamental Theorem of Calculus. Oresme also had the idea of 
mathematical indivisibles, which were used for area and volume calculations a few 
centuries later. 


Renaissance 


As mathematics in Europe started to become more sophisticated, in part due to 
their importation of ideas from the Arab world, the work of Archimedes became 
known and appreciated, though such knowledge had a mixed impact. On the one 
hand, Archimedes’ calculations of some areas and volumes provided an inspiration to 
compute even more areas and volumes. On the other hand, the method of exhaustion, 
based upon the use of reductio ad absurdum to avoid infinitesimals or limits, became 
an unbearably heavy burden in all but the simplest cases. 

One of the people who helped promote a more computationally friendly method for 
computing areas and volumes than the strict Archimedean approach was Simon Stevin 
(1548-1620). As an engineer focused on getting results, Stevin, in De Beghinselen 
der Weeghconst of 1586, tried to modify the method of exhaustion by replacing the 
reductio ad absurdum with some simplifying ideas, for example the fact that if two 
quantities differ they do so by a finite amount, and therefore in order to show that two 
quantities are equal it suffices to show that they differ by less than any finite amount. 
(This fact, in modern notation, follows from Lemma 2.3.10.) 


Seventeenth Century 


Mathematicians in the 17th century continued to abandon the method of exhaustion, 
and started to make use of infinitesimals and indivisibles to find areas and volumes 
(limits in their modern form were not used for another two centuries). One of the 
first to do so was Johannes Kepler (1571-1630), who is most known for his three 
laws of planetary motion, but who also did work in geometry. As part of a calculation 
concerning planetary motion, Kepler showed what we write as [j sinxdx = 1—cosa 
by dividing the region into infinitely many small parts. In Nova stereometria doliorum 
vinariorum of 1615, which was meant to be of practical use for finding volumes of 
wine casks, Kepler computed volumes of solids of revolution by using infinitesimals 
freely to obtain his results. 
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An important use of indivisibles for finding areas and volumes was the work 
of Bonaventura Cavalieri (1598-1647), who was influenced by the approach of his 
teacher Galileo Galilei (1564-1642). Cavalieri’s Geometria indivisibilibus contin- 
uorum nova quadam ratione promota of 1635 was the first book entirely about the 
use of indivisibles, and Exercitationes geometricae sex of 1647 elaborated upon his 
approach; these books were very influential. Although Kepler and Cavalieri both 
found areas and volumes by breaking up regions into infinitely small pieces, their 
approaches were different, in that Cavalieri did not directly add up the infinitely 
many pieces as did Kepler, but rather found areas and volumes by comparing the 
sizes of the slices of different shapes. More specifically, Cavalieri used what we now 
call “Cavalieri’s Principle,” which says that if two regions in the plane are located 
between two parallel lines, and if every line parallel to these two lines intersects 
the two regions in line segments of equal lengths, then the two regions have equal 
area; the analogous result holds for regions in space. The use of indivisibles, which 
harked back to Oresme and other medieval scholars, essentially hid the limits involved 
in finding areas and volumes. Cavalieri’s Principle was touched upon by Heron of 
Alexandria (c. 10—c. 75) and known to Galileo, but it was Cavalieri who took this idea 
and made it into a workable tool for finding areas and volumes. Cavalieri calculated, 


though not rigorously, what we write as [5 x"dx = a for all n € N. He used clever 
reasoning going back and forth among dimensions, for example viewing what we 
write as [o x° dx as both the area under a parabola and the volume of a pyramid with 
square cross section. He argued the result up to n = 9 on an ad hoc basis, and inferred 
the general result. 

In addition to Cavalieri, who published it first in 1639, other mathematicians in the 
period 1635-1655 independently came up with [7 x" dx = = for all n € N, including 
Torricelli, Roberval, Pascal, Fermat and Wallis. Rather than using the geometric 


approach of Cavalieri, this integral was justified by Fermat, Pascal and Roberval by 


ky 2k4.tnk 


k EN, though the idea of a limit was not yet rigorous. Such calculations by algebraic 
manipulation were a step forward toward calculus. 

Pierre de Fermat (1601—1665), though a lawyer by profession, was in correspon- 
dence with many contemporary mathematicians such as Mersenne, Roberval and 
Descartes, and his mathematical work included both tangent problems and area prob- 


using upper sums and lower sums, and using the limit lim 
n—-oo 


lems. He correctly evaluated the integral faxs dx, using a subdivision of the interval 
(0, a] by geometric series. He was among the first to notice, though only in special 
cases, a link between tangent problems and area problems. Fermat appears to have 
anticipated the invention of calculus more than most of his contemporaries, in that he 
had some of the ingredients of both derivatives and integrals, but he did not appear 
to recognize either the derivative or the integral as a concept in its own right, and he 
cannot be considered as having invented calculus. 

Gilles de Roberval (1602-1675) was less concerned with rigor than Fermat, 
but he was very ingenious, obtaining, for example, various trigonometric integrals. 
Roberval also saw a relation, in some special cases, between tangent problems and 
area problems. 
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The work of Blaise Pascal (1623-1662) on areas and volumes was not as original 
as some of his contemporaries, but he was interested in exposition, and his writing on 
this matter influenced Leibniz. 

Evangelista Torricelli (1608-1647) solved a variety of tangent and area problems, 
including the area under the curve (+)"(7)” = 1, which was proposed by Fermat. 
He made use of the idea of having inscribed and circumscribed figures differ by 
less than a given magnitude in order to find areas (which is similar to the idea of 
Lemma 5.9.5 (3)). Torricelli had an informal understanding of the relation between 
tangent problems and area problems, though he did not explictly state the Fundamental 
Theorem of Calculus as we know it. 

René Descartes (1596-1650), in La Géométrie of 1637, stated that he thought 
it would not be possible for the human mind to determine the exact arc lengths 
of curves. He was proved wrong by a number of people, including William Neile 


(1637-1670), who found the arc length of the curve y = cx} in 1657, the architect 
and mathematician Christopher Wren (1632—1723), who found the arc length of the 
cycloid in 1658, and Hendrik van Heuraet (1634-1660), who found the geometric 
equivalent of the general formula for arc length in 1659. All three used the same 
approach we use today, which is to approximate the curve by polygons, and then use 
either an infinitesimal or limit argument. Some of the ideas of van Heuraet, including 
the differential triangle and the association of the area under a new curve with the arc 
length of the original curve, helped lead to the subsequent discovery of the general 
relation between tangent problems and area problems. 

In Geometriae pars universalis of 1668, James Gregory (1638-1675) compiled 
the known tangent, area and volume calculations from a variety of writers such as 
Cavalieri, Torricelli and others. Gregory’s writing, though verbal and geometric rather 
than analytical, helped put the focus on general methods by separating them from 
special cases, thereby eliminating a lot of the repetition found in the work of earlier 
mathematicians. He was aware of a special case of the inverse relation between tangent 
problems and area problems, in the context of a discussion of arc length of curves. 
Geometriae pars universalis could be considered to have the first published version 
of the Fundamental Theorem of Calculus, albeit in a special case, and in a geometric 
form as a relation between tangents and areas, and not in terms of differentiation and 
integration of functions. 

John Wallis (1616-1703) introduced x* where k is negative and/or rational in 
Arithmetica infinitorum of 1656. This work had an influence on Newton. Wallis 
conjectured the correct result for fax dx, and proved it for o Wallis’ approach was 
to use algebraic, rather than geometric, methods. 

Isaac Barrow (1630-1677), like Gregory, worked in a geometric vein, and his 
work, while not yet calculus as we know it, can be thought of as the end of the 
development of the geometric approach prior to the invention of calculus. Barrow, 
Newton’s predecessor as the Lucasian Professor of Mathematics at Cambridge, possi- 
bly influenced the young Newton’s mathematical development; conversely, Newton 
made some suggestions for Barrow’s book Lectiones geometricae of 1670. In this 
book, Barrow, who thought in terms of time and motion, and who built upon the ideas 
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of the medievalists, as well as Galileo, Cavalieri, Torricelli and Roberval, among 
others, had geometric statements of both versions of the Fundamental Theorem of 
Calculus. These results were expressed in terms of areas under curves and tangent 
lines to curves, rather than in terms of integrals and derivatives. However, though he 
had a definite understanding of the Fundamental Theorem of Calculus, Barrow did 
not exploit this understanding to provide a method for computing areas under curves, 
and in general did not transform the various available ideas about tangents and areas 
into a practical computational tool; as such, Barrow cannot be said to have invented 
calculus, though he was close. 


Newton and Leibniz 


Though it took two more centuries until calculus was brought into the form we know it 
today, the essence of calculus was first thought of by Isaac Newton (1643-1727) in the 
period 1665-1666 while home from Cambridge University because of the plague. In 
the unpublished October 1666 Tract on Fluxions Newton used an intuitive argument 
to show what we would phrase by saying that if A is the area under the curve y = f(x) 
then aA = y, thereby establishing the Fundamental Theorem of Calculus Version I. 
Newton computed a table of antiderivatives, in part using Integration by Substitution, 
where some of the antiderivatives were given explicitly and others were reduced to 
hyperbolic or circular functions, which Newton then handled with the binomial series 
(which he had previously discovered). He then used these ideas to solve some area 
problems essentially as we do today, and that was the birth of calculus. 

In the unpublished Tractatus de methodis serierum et fluxionum of 1671, Newton 
further elaborated upon what amounts to Integration by Substitution, implicitly used 
the equivalent of Integration by Parts, and worked out the arc length formula and 
computed some examples, such as y = 2 + 75; and y= ax3: for more complicated 
functions, he worked out the arc lengths in terms of power series. 

Gottfried von Leibniz (1646-1716) started working toward his version of calculus, 
presumably independently of Newton in spite of all the controversy, in late 1673 or 
early 1674, when he discovered a version of the inverse relation of tangent problems 
and area problems, by using something like Cavalieri’s Principle to show that an 
area under one curve is equivalent to the area under a certain associated curve, 
the construction of which involves tangents. In modern terms, he ended up doing 
something like a special case of Integration by Parts. He used this method to solve 
some area problems, and to obtain the series | = 1 — + ; - ; +--+, Leibniz worked 
out his version of calculus in the latter part of 1675, as recorded in a series of 
unpublished notes, by the end of which he had the f and dx notations, and he had 
solved some non-trivial problems. Leibniz thought of dx and dy as infinitely small 
changes in x and y, respectively; he seemed to think of curves as polygons with 
infinitesimal edges. He viewed the notation { ydx (which we write as [/’ y(x) dx) as 
meaning the sum of infinitesimal rectangles with heights y and widths dx. He stated 
the Fundamental Theorem of Calculus Version I, and he had the formula for arc 
length. Leibniz published his calculus in three papers in 1684, 1686 and 1693, after 
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Newton had written some manuscripts, but before he published his version of calculus 
in 1704. 

Newton and Leibniz had different approaches to integrals. Leibniz viewed integrals 
as a type of sum, and he viewed differentials as a type of difference; by analogy with 
the finite case, Leibniz thought of integrals as inverse to differentials. Hence, Leibniz 
viewed integrals as separately defined from derivatives. For Newton integrals are 
what we call indefinite integrals, though he solved area problems with them, by 
looking at rates of change of what we write as [* f(t) dt. Ultimately, both Newton 
and Leibniz arrived at the inverse relation of area problems and tangent problems, 
and they exploited this relation to provide simple methods for solving a variety of 
problems. 


Eighteenth Century 


Although integrals (in the guise of areas) were originally viewed as some sort of 
infinite sum, after Newton and Johann Bernoulli (1667-1748) the view switched 
to viewing integrals as antiderivatives, which resulted in some confusion about the 
relation of what we now call definite integrals and indefinite integrals. Because 
integrals were viewed as antiderivatives, derivatives became the central object of 
calculus. The idea of area at the time was rather intuitive, though perhaps because 
Newton’s approach seemed to work for nice functions, and poorly behaved functions 
were generally avoided, no one felt compelled to look any further at the meaning of 
the concept of area. 

By the time of Joseph Fourier (1768-1830), however, it was realized that integrals 
of more complicated functions, which arose in real-world situations, were needed. 
Moreover, Fourier, who wanted to calculate what we call Fourier coefficients, changed 
the focus from indefinite integrals back to definite integrals. Fourier introduced the 
notation th f(x) dx; Leibniz used only the notation { f(x) dx. Cauchy was influenced 
by Fourier in focusing on definite integrals, though Cauchy’s definition of such 
integrals was precise, rather than Fourier’s informal notion of area. 


Nineteenth Century 


Augustin Louis Cauchy (1789-1857) was the first person to provide a rigorous 
treatment of integrals, meaning definite integrals, separately from their relation to 
derivatives. Cauchy used left-hand sums sums to define integrals; he did not invent 
such sums (Euler and others had used them to approximate integrals), but Cauchy 
was the first person to use them in the definition of integrals. Cauchy aimed to prove 
that definite integrals of continuous functions always exist. He used partitions and 
refinements of partitions, just as we do today. Cauchy had some logical gaps in his 
argument, glossing over the difference between continuity and uniform continuity, and 
implicitly using results that could only be proved with the axioms for the real numbers, 
but the core ideas he used are familiar to modern students of calculus. Cauchy gave 
the first rigorous statements and proofs of the two versions of the Fundamental 
Theorem of Calculus; his arguments, which are still used, work for continuous, and 
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also piecewise continuous, functions. Because Cauchy restricted his attention to nicely 
behaved functions, however, he did not give a general characterization of integrability. 

The next major step forward in the development of integration was due to Georg 
Friedrich Bernhard Riemann (1826-1866). In his Habilitationschrift Uber die Darstell- 
barkeit einer Function durch eine trigonometrische Reihe of 1854 (published in 1867), 
Riemann looked at the representation of functions as Fourier series, and in the course 
of his study, where he wanted to look at functions that were not necessarily con- 
tinuous, he recognized the need to give a more precise definition of integrability 
to accommodate functions that are not continuous (because Fourier coefficients are 
computed via integrals). Riemann then defined what we now call Riemann sums, and 
essentially gave our modern definition of integrals in terms of such sums (though 
without the e—6 formulation). Riemann gave criteria that are equivalent to integrability 
for bounded functions, including the equivalent of looking at the difference between 
upper sums and lower sums, though upper sums and lower sums were introduced 
only in the 1870s by several mathematicians, including Gaston Darboux (1842-1917). 
Riemann also gave an example of a function with a dense set of discontinuities that 
is integrable. Similarly to Cauchy, Riemann ignored issues that today we know rely 
upon the axiomatic properties of the real numbers. 

In 1875 Darboux looked at upper integrals and lower integrals, though using 
limits rather than Least Upper Bounds, and he proved what we call the Fundamental 
Theorem of Calculus Version I. Giuseppe Peano (1858-1932) pointed out that it 
would be possible to use greatest lower bounds and least upper bounds instead of 
limits in the definition of upper integrals and lower integrals, as we do now. 

The first person to give a rigorous definition of area was Peano, in 1887. He 
considered the inner area, denoted a;(S), of a planar region S as the least upper 
bound of the areas of all polygons contained in the region, and similarly for the 
outer area, denoted a,(S). Clearly aj(S) < ao(S). The area was said to exist if and 
only if a;(S) = a,(S), and if equality held this number was defined to be the area. 


Peano showed that a;(S) = [? f(x) dx and a,(S) = {? f(x) dx for the region below the 
graph of a non-negative function; it then followed that the region under the graph of a 
non-negative function has an area if and only if the function is integrable, in which 
case the area is the integral. Peano’s definition of area is called Jordan content today, 
due to the further development of the subject by Camille Jordan (1838-1922) in 1893. 
Jordan used only polygons with horizontal and vertical sides, which is equivalent to 
what we do in this text. 


Twentieth Century 


A major step forward in the study of integration was due to Henri Lebesgue (1875- 
1941) in the early 20th century. He proved what we now call Lebesgue’s Theorem, 
which is a characterization of which functions are Riemann integrable, and he defined 
what we call the Lebesgue integral, which equals the Riemann integral for Riemann 
integrable functions, but which exists for many functions that are not Riemann 
integrable (for example the Dirichlet function defined in Example 3.3.3 (6)), and 
which has some very convenient properties not obeyed by the Riemann integral. The 
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Lebesgue integral is defined in terms of Lebesgue measure, which supersedes the use 
of Jordan content. Other types of integrals were also defined in the 20th century, for 
example the Henstock—Kurzweil integral. 


6 


Limits to Infinity 


6.1 Introduction 


When we studied limits of functions in Chapter 3, we often considered expressions 
of the form “lim f(x) = L,” where the symbols c and L both denoted real numbers. 


However, it is also possible to consider limits involving not only real numbers, 
but limits to “infinity” and “negative infinity,” which are written lim f(x) = L, and 
X— 00 


lim f(x) =L, and lim f(x) =, and lim f(x) = —ce. It is also possible to combine 
xX—— 99 ee x—C 

these two types of limits, for example lim f(x) = ©. In all of the types of limits that 
X—00 


involve co and —cs, it is important to recognize that the symbols “co” and “—co” are not 
real numbers, but are rather a shorthand way of indicating that something is growing 
without bound either in the positive direction or in the negative direction. 

We will define the different types of limits to infinity in Section 6.2. In Sections 6.3 
and 6.4 we discuss two very useful topics involving limits to infinity, both of which 
appear in calculus courses, namely, |’ H6pital’s Rule and improper integrals. Limits to 
infinity have many applications; for example, we will use such limits, and improper 
integrals, in our discussion of trigonometric functions and z in Sections 7.3 and 
7.4. Other useful applications include Laplace transforms, which in turn are used for 
solving differential equations, and continuous probability; such topics are beyond 
the scope of this book. See [BD09, Chapter 6] for Laplace transforms as used for 
differential equations, and see [Ros10, Chapter 5] for continuous probability. 

As was the case in previous chapters, here too we assume that the reader is 
informally familiar with the standard elementary functions and their basic properties, 
such as continuity and differentiability, in order to have sufficiently many functions 
to see interesting examples of the material in this chapter. We will see a rigorous 
treatment of the elementary functions in Chapter 7; the proofs in that chapter, though 
making use of some of the general ideas in the present chapter, will not make use of 
the examples in the present chapter, and there is no circular reasoning. 
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6.2 Limits to Infinity 


We define two types of limits to infinity. The first type, which we will refer to as 
Type 1 limits to infinity, and which is denoted lim f(x) =Lor lim f(x) = L, has 
x— oo xXx—— 00 
x go to infinity or negative infinity, but has the value of f(x) go to a real number 
L. The second type, which we will refer to as Type 2 limits to infinity, and which 
is denoted lim f(x) = or lim f(x) = —ce, has x go to a real number c, but has the 
P cee 4 x—-c 
value of f(x) go to infinity or negative infinity. Type 1 limits to infinity correspond to 


horizontal asymptotes of graphs of functions, as seen in Figure 6.2.1 (i), and Type 2 
limits correspond to vertical asymptotes, as seen in the Part (ii) of the figure. 


(i) (ii) 


Fig. 6.2.1. 


We start our discussion with Type | limits to infinity. In the ordinary type of limit, 
denoted lim f(x) = L, we mean intuitively that f(x) gets closer and closer to a number 


Las the value of x gets closer and closer to a number c. By contrast, in a limit of the 
form lim f(x) = L, the idea is that f(x) gets closer and closer to a number L as the 
xX— oo 


value of x gets larger and larger, which is symbolically denoted by “x — o,” though 
there is no real number “‘co” that the number x is getting closer and closer to. Hence, 
in our definition of this type of limit, we replace the expression “|x —c| < 6,” which 
is thought of as x being near c, with the expression “x > M,’ which is thought of as x 
being large. 

For the following definition, recall the definition of right unbounded interval and 
left unbounded interval given in Definition 2.3.6. 


Definition 6.2.1. 


1. Let 7 C R be a right unbounded interval, let f: J — R be a function and let 
Le€R. The number L is the limit of f as x goes to infinity, written 


lim f(x) =L, 
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if for each € > 0, there is some M € R such that x € J and x > M imply 
|f (x) —L| < €. If lim n f(x ) = L, we also say that f converges to L as x goes 


to infinity. If f Sonyerses to some real number as x goes to infinity, we say 
that lim f(x) exists. 
xX— 00 


2. Let J CR be a left unbounded interval, let f: J — R be a function and let 
LeéR. The number L is the limit of f as x goes to negative infinity, written 


lim f(x) = 


xX—>— 00 
if for each € > 0, there is some P € R such that x € J and x < P imply 
|f(x) —L| <e€.If lim f(x) =L, we also say that f converges to L as x goes 

X——00 
to negative infinity. If f converges to some real number as x goes to negative 
infinity, we say that lim f(x) exists. A 
X—— 00 
As was the case for ordinary limits, here too we need to prove that if lim f(x) =L 
x— oo 

or Jim ics (x) = L for some L € R, then there is only one such number L. The analog 


of the following lemma for limits to negative infinity also holds, though for the sake 
of brevity we will not state it. 


Lemma 6.2.2. Let I C R be a right unbounded interval, and let f: I + R be a 
function. If lim f(x) = L for some L € R, then L is unique. 
X— 00 


Proof. Left to the reader in Exercise 6.2.2. 


Because of Lemma 6.2.2 we can refer to “the” limit of a function as x goes to 
infinity, if the limit exists, and similarly for limits to negative infinity. 


Example 6.2.3. 
(1) We will prove that jim ee 


a= 3. (In principle, we should have stated that 


the function under ee is f: (—3,00) > R defined by f(x) = — for all 
xE (-3, oo), but that is implicitly clear, and we will not write out the name of the 
function in other similar situations. ) 

Let € > 0. Let M = iE eee that x € (—3,ce) and x > M. Then x > i which 


means that x > 0, and ence 2 7 SE. Then 


<€E. 


Sx+4 5] _ 
2x+3 2] 


7 bees a 
4x+6| 4x+6 > 4x 


(2) We will prove that lim x* does not exist. Suppose that iim x* = L for some 


LER. Lete =1. Let MER. Let x = max{M+1,L+ 1,1}. Then x WM, There 
are now two cases. First, suppose that L > 0. Because x > L+ 1 > 1, it follows that 
x* > (L+1)? > L+1. Hence x* — L > 1, and therefore |x? — L| > 1 = €. Second, 
suppose that L < 0. Because x > 1, then x > 1, and it follows that 2-1 > 1, 
which again implies that |x? —L| > 1 = €. Putting the two cases together, we see that 
|x* — L| < €, which is a contradiction to the hypothesis that jim, Yel, ?) 
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Many of the lemmas and theorems that were proved for ordinary limits in Sec- 
tion 3.2 have analogs for Type | limits to infinity. We will not state and prove all such 
results, but as a useful example we will prove the following theorem; the analog of 
this theorem also holds for limits to negative infinity, though again for the sake of 
brevity we will not state it. The reader is urged to observe the exact analogy between 
not only the statements of Theorem 3.2.10 and Theorem 6.2.4, but also the proofs of 
these theorems. 


Theorem 6.2.4. Let 1 C R be a right unbounded interval, let f,g: I — R be functions 
and let k € R. Suppose that lim f(x) and lim g(x) exist. 
xX— 00 X— 00 


7. lim [f + g(x) exists and lim, [f + g|(x) = dim f(x) +1 + lim n g(x). 
2. lim [f — g(x) exists and lim [f — g(x) = lim f(x) — | - lim n g(x). 
3 lim [k f(x) exists and lim Tkfl(®) = kim f(x). 
4, lim [f¢](x) exists and lim [fe|(x) = [lim f(2)]-[lim @(x)} 
lim f(x) 
5. If lim, (g)(x) £0, then Jim [4] (x) exists and lim [4] (x) = aaa 


X— oo 


Proof. We will prove Parts (1) and (4), leaving the rest to the reader in Exercise 6.2.3. 
Let L = lim f(x) and M = lim g(x). 
xX— 00 xX— 00 


(1) Let € > 0. Then there is some P € R such that x € J and x > P imply | f(x) — 
L| < §, and there is some Q € R such that x € J and x > Q imply |g(x) —M| < §. Let 
R = max{P,Q}. Suppose that x € J and x > R. Then 


[fF + g](x) —(L+M)| = |(F(x) —L) + (g(%) —M)| SF) —L + |g(x) -M| 


(4) Let € > 0. There is some P € R such that x € J and x > P imply |g(x) —M| < 1. 
Using Lemma 2.3.9 (7) we see that x € J and x > P imply |g(x)|—|M| < 1, and hence 
|g(x)| < |M|+ 1. Observe that |Z| + |M|+ 1 > 0. There is some Q € R such that x € J 
and x > Q imply | f(x) —L| < Hilal and there is some R € R such that x € J and 


x > R imply |g(x) —M| < TaESITIES . Let S = max{P,Q,R}. Suppose that x € J and 
x > S. Then 


\[ Fs] (x) —LM| = |f(x)g(x) — LM] = |f(x)a(x) — g@)L + g(x)L— LM | 
S |g(x)]- [F(@) —£] + IL] -13(@) — | 


< ([M| +1) +L]: 


e 
\L|+|M|+1 — 


E 
|L| + |M|+1 


We now turn to Type 2 limits to infinity, including one-sided limits of this type. On 
the one hand, there is a certain technical similarity between Type | and Type 2 limits 
to infinity, though with the roles of the “x” and “y” coordinates reversed. Conceptually, 
however, there is a substantial difference cots these two types of limits to infinity. 
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For a Type | limit to infinity, when we write “lim f(x) = L,” we say that f converges 
x— 00 


to L as x goes to infinity. By contrast, for a Type 2 limit to infinity, when we write 

“Jim f (x) = 0,” we mean that f(x) is growing larger and larger as x gets closer and 
x—->Cc 

closer to c, which implies in particular that there is no L € R that the values of f(x) are 
approaching. Hence, when we write “lm a f(x (x) = 0,” we say that f diverges to infinity 


as x goes to c. It might seem strange to a that f “diverges to” something, because 
divergence means a lack of convergence, and convergence means getting closer and 
closer to something. However, there are a variety of ways in which a function f can 
be divergent as x goes to a number c. Compare, for example, the behavior near 0 of 
the functions f,g: R— {0} — R defined by f(x) = a for all x € R— {0}, seen in 
Figure 6.2.2 (i), and g(x) = sin + for all x € R— {0}, seen in Part (ii) of the figure. 
The function f diverges as x goes to 0 because the values of f(x) are getting larger 
and larger, whereas the function g diverges as x goes to 0 because the values of 
g(x) oscillate more and more frequently. In both cases the values of f(x) are not 
approaching a single real number, and hence both lim f (x) and lim g(x) do not exist, 


but the reasons for the non-existence are quite different. Hence, when we say that 
“f diverges to infinity as x goes to 0,” we are saying something about the type of 
divergence. 


—S 


(ii) 


Fig. 6.2.2. 


Definition 6.2.5. Let 7 C R be an interval, let c € J and let f: 1—{c} —~ R bea 
function. 


1. Suppose that / is an open interval. The function f diverges to infinity as x 
goes to c, written 


lim f(x) = 0 


xe 
if for each M € R, there is some 6 > 0 such that x € J— {c} and |x—c| <6 


imply f(x) > M. The function f diverges to negative infinity as x goes to c, 
written 
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lim f(x) = —e, 


x—->C 
if for each N € R, there is some 6 > 0 such that x € J— {c} and |x—c| <6 
imply f(x) <N. 
2. Suppose that c is not a right endpoint of /. The function f diverges to infinity 
as x goes to c from the right, written 
lim f(x) =, 
im Ss, 
if for each M € R, there is some 6 > 0 such that x € 7—{c} andc<x<c+6 
imply f(x) > M. The function f diverges to negative infinity as x goes to c 
from the right, written 
lim, f(x) = —20, 


xc 
if for each N € R, there is some 6 > 0 such that x € J— {c} andc<x<c+6 
imply f(x) <N. 
3. Suppose that c is not a left endpoint of /. The function f diverges to infinity 
as x goes to c from the left, written 
lim f(x) =~, 
xc 
if for each M € R, there is some 6 > 0 such that x € 7—{c} andc—d <x<c 
imply f(x) > M. The function f diverges to negative infinity as x goes to c 
from the left, written 


lim f(x) = —=, 


x—c7 
if for each N € R, there is some 6 > 0 such that x € J— {c} andc—8 <x<c 
imply f(x) <N. A 


When proving that a function diverges to infinity, it is not necessary to consider 
all M € R; it is sufficient to consider only M € R such that M > Mo, for any given 
choice of Mp € R. See Exercise 6.2.9 for details. For example, we will sometimes 
restrict our attention to M > 0. The analogous result holds for one-sided divergence 
to infinity, and for divergence to negative infinity. 

The following lemma, which is the analog of Lemma 3.2.17, shows the expected 
relation between a function diverging to infinity at a point c and the function diverging 
to infinity from the left and from the right at c. As expected, the analogous result 
holds for divergence to negative infinity. 


Lemma 6.2.6. Let I C R be an open interval, let c € I and let f: I—{c} > Rbea 
function. Then lim f (x) = © if and only if lim f(x) =coand lim f(x) =. 
xe xc xc” 


Proof. Left to the reader in Exercise 6.2.12. 
Example 6.2.7. 
(1) We will prove that ae + = oo, Let M € R. We may assume that M > 0. Let 


6= TE Suppose that x € R— {0} and |x—0| < 6. Then |x| < 6, and hence |x| < we 
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which implies x? = |x|? < 74. Because x? > 0 and M > 0, we deduce that 7 >M. 
(2) Let f: R-— {0} — R be defined by f(x) = + for all x € R— 0}, We will 

prove that f does not diverge to infinity or to negative infinity as x goes to 0. First, 

we will prove that lim, 7 = oo, Let M € R. We may assume that M > 0. Let 6 = WT 

x—0 
Suppose that x € r- {0} and0<x<0+6.Then0<x< iw and hence + > M. It 
follows that Jim | = = 09. A similar argument shows that Jim > = —e. It now follows 
—0F 07 
from Lemma 6.2.6 that f does not diverge to infinity or to negative infinity as x goes 
to 0. 0 


Our next theorem is the Type 2 analog of parts of Theorem 6.2.4. Observe that 
Theorem 6.2.8 is missing a treatment of differences and quotients of functions, an 
absence that will be explained in Section 6.3. Even those parts of Theorem 6.2.4 
that do have analogs in Theorem 6.2.8 are more complicated in the latter than in the 
former. In general, Type 2 limits to infinity are not as well behaved as Type | limits to 
infinity, which should not be too surprising, given that Type | limits to infinity involve 
convergence to real numbers, and so we can use addition, subtraction, multiplication 
and division with such limits, whereas Type 2 limits to infinity involve divergence, 
and so we cannot use these four operations. 

In the following theorem, when we say “im h(x) exists” we mean it as an ordinary 


limit, not as a Type 2 limit to infinity, fecduse the latter type of limit does not exist as 
a real number, which is the only type of number with which we are working. 


Theorem 6.2.8. Let I C R be an open interval, let c € I, let f,g,h: I— {c} + R be 
functions and let k € R. 


1. Suppose that lim f(x ) = c0 and lim g(x a oo, and that lim h(x) exists. Then 
lim [f + g](x) =e and lim [f +4](x) = - 

2. Suppose that lim f(x) = —c and lim a = —co, and that lim h(x ) exists. 
Then lim Ff +al(x) = —co and lim if -+h| (x) = 

3: Sipbose that lim f(x ) = 0 and Tim g(x ) = —00, fk: > 0, then lim [Kf |(x) = 
and lim [kg] (x :) =—o, [fk <0, hen lim [kf |(x) = —ce and tim te ) = 0, 

4. Sapppse that lim f(x) = e% and lim g(x = = 0, and that lim ie :) exists. Then 
lim [fg] (x) oe, FRG )>9, then lim [fh](x) = 09. If lim h(x x) <0, then 
inpalgsce. a 

5. oe that lim f(x) = —oco and Tim g(x ) = —09, and that lim h(x ) exists. 
Then lim [fg](x) =o. If lim h(x) > 0, then lim [fh] (x) = —0. if lim h(x ) <0, 
then lim miGi=s 

6. Supnase that Tim f(x ) = ccand lim g(x) = —oco, Then lim [fg](x) = —0-. 


Proof. We will prove Parts (1) and (4), leaving the rest to the reader in Exercise 6.2.13. 
(1) We start with f+ g. Let M € R. We may assume that M > 0. There is some 
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6; > 0 such that x € J— {c} and |x—c| < 6; imply f(x) > M, and there is some 
62 > 0 such that x € J — {c} and |x—c| < 62 imply g(x) > M. Let 6 = min{6, dy}. 
Suppose that x € J— {c} and |x—c| < 6. Then 


Lf + sl(x) = f(a) +8) >M+M>M. 


We now turn to f +h. Let Q = limh(x). Let N € R. There is some 7 > 0 such 
xe 
that x € J — {c} and |x —c| < 1 imply f(x) > N—Q-+ 1, and there is some 72 > 0 
such that x € J— {c} and |x—c| < m2 imply |A(x) — Q| < 1. Let 7 = min{m, 72}. 
Suppose that x € J — {c} and |x —c| < 7. Then |h(x) — Q| < 1, which implies that 
—1 <A(x)—Q <1, and hence Q—1 < h(x). Then 


(f +h)(x) = f(x) +h(x) > (N-O+1)+(Q-D=N. 


(4) We start with fg. Let M € IR. We may assume that M > 1. There is some 
6, > 0 such that x € J— {c} and |x—c| < 6; imply f(x) > M, and there is some 
62 > 0 such that x € J— {c} and |x—c| < 62 imply g(x) > M. Let 6 = min{6), dy}. 
Suppose that x € J— {c} and |x—c| < 6. Then 


[fgl(x) = f(x)g(x) > M-M > M. 


We now turn to fh. Suppose that lim h(x) > 0. By the Sign-Preserving Property 
for Limits (Theorem 3.2.4) there is some Q > 0 and some 1); > 0 such that x € J — {c} 
and |x—c| < 7 imply h(x) > Q. 

Let N € R. We may assume that NV > 0. There is some 12 > 0 such that x € J— {c} 
and |x—c| < 2 imply f(x) > e Let n = min{71, 12}. Suppose that x € J— {c} and 
|x —c| <7. Then 


LFA](s) = Fla)M(x) > F-O=N. 


The case where lim h(x) < 0 is similar, and we omit the details. 
x—->C 


The analog of Theorem 6.2.8 for right-hand limits and left-hand limits also holds, 
though we omit the details. 

Having discussed Type 1 and Type 2 limits to infinity separately, we note that it is 
also possible to combine these two types of limits, and consider limits of the form 
iim f (x) =, and also with —co replacing one or both occurrences of ©. This topic 


is left to the reader in Exercise 6.2.15. 

Finally, we note that because many of the properties of limits to infinity (espe- 
cially Type 1 limits to infinity) are analogous to properties of ordinary limits (as 
discussed in Section 3.2), some authors combine various aspects of limits to infinity 
and ordinary limits by making use of the “extended real numbers,” which is the set 
R* = RU {—«, 0}, where ““—co” and “co” are two symbols not in the real numbers. 
The operations addition and multiplication can then partially be extended to include 
—co and oo, for example by stating oo + co = oo, and x + co = co for all x € R. These 
last two properties are meant to reflect Theorem 6.2.8 (1), and other parts of that 
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theorem also lead to properties of the extended real numbers. It is also possible to 
define an order relation on the extended real numbers by setting —co < x < ~ for all 
x € R. However, not all properties of the real numbers have analogs in the extended 
real numbers. For example, as will be seen in Exercise 6.3.2 and Example 6.3.1, 
it is not possible to say that c+ (—co) equals 0, or that = equals 1. Formally, the 
extended real numbers are not an ordered field, as defined in Definition 2.2.1. The 
symbols ““—co” and “co” are not real numbers, and do not behave as do real numbers, 
and hence care is needed when using the extended real numbers. Having said that, 
the extended real numbers can be defined quite rigorously, for example using the 
method of Dedekind cuts, which we used to construct the ordinary real numbers in 
Sections 1.6 and 1.7. The advantage of using the extended real numbers is efficiency, 
for example by combining the statements of Theorem 3.2.10 and Theorem 6.2.4 into 
a single theorem by looking at limits of the form lim f(x), where c € R*. For our 


purposes, however, the efficiency gained by using the extended real numbers is not 
worth the price of having to offer a rigorous treatment of the extended real numbers, 
and hence we will not be using this concept. 


Reflections 


The reader might wonder why this section was not included in Chapter 3, which 
is devoted to limits and continuity. The definitions of limits in the present section are 
indeed just variants of the definition of limits given in Chapter 3, and technically it 
would have made sense to include the present section in that earlier chapter. There 
are, however, a few pedagogical reasons for arranging the material as we have. First, 
learning to use the €—6 definition of limits given in Section 3.2 is initially tricky 
for some students, and there is no advantage in making matters even more difficult 
by introducing a few variants of the original definition right at the start. Second, 
even though limits to infinity are formally rather similar to regular limits, there is a 
substantial conceptual difference between the limits discussed in Section 3.2 and those 
discussed in the present section, because there is no number called “infinity,” and 
hence the notion of converging to a number is quite distinct from converging to infinity. 
It is therefore helpful to make this distinction clear by separating the discussion of 
the two types of limits. Third, limits to infinity show up in two standard places in a 
typical calculus course (and introductory real analysis course), namely, |’ H6pital’s 
Rule and improper integrals, and it is convenient to cluster these applications of limits 
to infinity together with the definition of such limits, rather than having the definition 
a few chapters before it is first used. The author organizes the material involving 
limits to infinity when he teaches introductory calculus the same way as is found in 
the present chapter. 


Exercises 


Exercise 6.2.1. Using only the definition of Type 1 limits to infinity, prove that each 
of the following limits holds. 


(1) tim 5 =0. 
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i 2x+7 _ 2 
(2) lim st = 5- 


s xtt] _ 1 
(3) iim 3x24+x °° «3° 


Exercise 6.2.2. [Used in Lemma 6.2.2.] Prove Lemma 6.2.2. 
Exercise 6.2.3. [Used in Theorem 6.2.4.] Prove Theorem 6.2.4 (2) (3) (5). 


Exercise 6.2.4. Let / C R be a right unbounded interval, let f: J — R be a function 
and let L € R. Prove that lim f(x) = Lif and only if lim f(—x) =L. 
xX— 00 xX—— co 


Exercise 6.2.5. Let J C R be a right unbounded interval, and let f,g: J — R be 
functions. Suppose that lim f(x) =0, and that g is bounded. Prove that lim [fg](x) = 
xX— oo X—0o 
0. 
Exercise 6.2.6. Let J C R be a right unbounded interval, and let f,g: J — R be 
functions. Suppose that f(x) < g(x) for all x € J. Prove that if lim f(x) and lim g(x) 
xX— 00 X— 00 
exist, then lim f(x) < lim g(x). 
xX— 00 xX— oo 
Exercise 6.2.7. Let J C R be an interval, let c € J and let f: 7—{c} + R bea 
function. Suppose that for each M € R, there is some x € J— {c} such that f(x) > M. 

(1) Prove that lim f(x) does not exist. 

(2) Is it necessarily true that lim f (x) = c? Give a proof or a counterexample. 
Exercise 6.2.8. [Used in Example 4.2.5.] Let J C R be an interval, let c € J and let 
f,g: 1—{c} — R be functions. Suppose that lim f(x) = c. Suppose that there is 

xc 

some q € R such that q # 0, and that for each 6 > 0, there is some x € J— {c} such 
that |x —c| < 6 and g(x) = q. Prove that lim f(x)g(x) does not exist. 

Exercise 6.2.9. [Used in Section 6.2.] Let 7 C R be an interval, let c € J, let 
f: 1—{c} — R be a function and let Mp € R. Prove that lim f(x) = °c if and only 

xX—-C 

if for each M € R such that M > Mo, there is some 6 > 0 such that x € J— {c} and 
|x —c| < 6 imply f(x) > M. 

Exercise 6.2.10. Let J CI C R be open intervals, let c € J and let f: 1—{c} —R 


be a function. Prove that lim f(x) = c if and only if lim f|;(x) =o. 
ue xe 


Exercise 6.2.11. Using only the definition of Type 2 limits to infinity, prove that 
lim +43 =. 

x3+ * 

Exercise 6.2.12. [Used in Lemma 6.2.6.] Prove Lemma 6.2.6. 

Exercise 6.2.13. [Used in Theorem 6.2.8.] Prove Theorem 6.2.8 (2) (3) (5) (6). 


Exercise 6.2.14. Let ,/ CR be open intervals, let c € I, let d € J and let g: I— {c} > 
J—{d} and f: J—{d} — R be functions. Suppose that lim g(y) = d and that 
ye 


lim f(x) = 0, Prove that lim (fo g)(y) =. 
x yore 
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Exercise 6.2.15. [Used in Section 6.2, Exercise 6.2.16, Example 6.4.3, Exercise 7.2.16 
and Exercise 8.2.10.] 


(1) Let J C R be a right unbounded interval, and let f: J — R be a function. Give 
a precise definition of what it would mean to say that the function f diverges 
to infinity as x goes to infinity, written 


(2) Using only the definition you gave in Part (1) of this exercise, prove that 
lim: x? =o, 


(3) Using only the definition you gave in Part (1) of this exercise, prove that 


lim /x = ©. 


xX— 00 


(4) Using only the definition you gave in Part (1) of this exercise, prove that 


Exercise 6.2.16. This exercise makes use of Exercise 6.2.15. Let /,J C R be right 

unbounded intervals, and let f: J — R and g: J —/ be functions. Suppose that 

lim f(x) = L for some L € R, and that lim g(x) = ©. Prove that lim (fo g)(x) = L. 
xX—00 xX—00 


xX— 00 


6.3 Computing Limits to Infinity 


Type 2 limits to infinity are not as well behaved as Type | limits to infinity. This 
difference can be seen comparing Theorem 6.2.4, which is for Type | limits to infinity, 
with Theorem 6.2.8, which is for Type 2. The fact that the latter theorem is missing 
a treatment of differences and quotients of functions, an absence we will explain 
shortly, makes Type 2 limits to infinity harder to compute in practice. In the present 
section we discuss a few methods that help us compute specific cases of Type 2 limits 
to infinity, the most well-known of which is l’ H6pital’s Rule, and related topics. 

To see what is difficult about certain categories of Type 2 limits to infinity, let us 
recall Theorem 6.2.8, which shows us those aspects of such limits that do work nicely. 
The statement of Theorem 6.2.8 (1), for example, can be summarized by writing 
“oo + 00 = 00,” and “‘oo+ c = oo” for all c € R. Of course, we do not literally mean 
addition of real numbers in these expressions, because the symbols “co” and “‘—co 
are not real numbers, and the notion of addition that we have for real numbers does 
not apply to these two symbols. However, these two expressions are useful in that 
they suggest in very concise terms the result stated in Theorem 6.2.8 (1), and we can 
use such expressions as long as we do not take them to be more than just suggestive 
notation. 

Using the above notation, we observe that whereas Theorem 6.2.8 treats expres- 
sions such as co + co and co- co, missing from that theorem are expressions such as 
co —co and =. The following example shows us why = is missing from Theorem 6.2.8. 
The reader is asked in Exercise 6.3.2 to supply similar examples for co — co, which 
show why co+ (—ce) is missing from the theorem. 


” 
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Example 6.3.1. Let f,g,h: R— {0} — R be defined by f(x) = 4 and g(x) = 4 and 
h(x) = a for all x € R— {0}. We saw in Example 6.2.7 (1) that lim f(x) = lim g(x) = 
? x— x— 
co, and it follows from Theorem 6.2.8 (4) that lim h(x) = iim [fg](x) = ce. Hence the 


ee an 3 of ea es 
three limits lim [4] (x), and lim [4] (x), and lim 7] (x) all have the form =. 


On the other hand, we observe that [4] (x) = 1 and [Z] (x) =x? and 7] (x) = 

for all x € R— {0}. Therefore lim [4] (x) = 1, and lim [4] (x) = 0, and lim [4] (x) = 
x—' x— Fa 

co. Hence there is no single value, either a real number or infinity, shared by all limits 

of the form =. 


UR 


oO 


Because of Exercise 6.3.2 and Example 6.3.1, we refer to the expressions co — co 
and = as “indeterminate forms.” By contrast, the expressions co + co and co: co are not 
indeterminate forms, because they both equal something specific (which happens to 
be 9). 

Although the above discussion was motivated by the need to compute Type 2 
limits to infinity, we note that there are also indeterminate forms for some types of 
limits that do not involve infinity. In the following example, we see that 4 is also an 
indeterminate form. 


Example 6.3.2. Let f,¢,: R—{5} — R be defined by f(x) = x° — 25 and g(x) = 
x—5 and h(x) = x° — 15x* +. 75x — 125 for all x € R — {5}. Because f, g and h are 
polynomial functions, then they are continuous, as remarked in Example 3.3.7 (1), and 
hence it is seen that lim f(x) = f(5) =0, and similarly lim g(x) = 0 and lim h(x) =0. 


Hence the three limits lim [2] (x), and lim [4] (x), and lim [#7] (x) all have the form 


8 
0. 

We cannot apply Theorem 3.2.10 (5) for any of these three limits, because the 
hypotheses of that theorem are not satisfied. Nonetheless, we can evaluate these three 
limits as follows. First, we note that f(x) = (x—5)(x+5) and h(x) = (x —5)? for all 
x € R— {5}. Next, we observe that as we take the limit of a function as x approaches 5, 
we never use the value x = 5 in our function, and hence x — 5 is never 0, and therefore 


we can cancel by x —5 when needed. Then 


2 = 
lim 4] (x) = lim ~ 25 sim @ 5+) 
g x5 x—5 x5 x—5 


= lim (x +5) =54+5=10. 
x. 


x5 


Similar computations show that lim [4] (x) = 0 and lim [£] (x) = ee, where the latter 
x= * a= 
makes use of Example 6.2.7 (1) and Exercise 6.2.14; the details are left to the reader. 


Hence there is no single value, either a real number or infinity, shared by all limits of 
the form 3. © 


Although expressions such as a and = are indeterminate forms, it is important to 
note that not every fraction involving at least one of 0 or © is an indeterminate form. 
For example, Theorem 3.2.10 (5) tells us that limits of the form 2, where c € R— {0}, 
are always equal to 0, and hence are not indeterminate. 
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We now have two theorems that show, when appropriately stated, that the forms 
5 and £ are not indeterminate, having values o and 0, respectively. For the first of 
these theorems, the hypotheses in Part (2) are needed to guarantee that the function 
does not approach c from one side and —co from the other side. 


Theorem 6.3.3. Let I C R be an open interval, let c € I and let f,g: I—{c} +R 
be functions. Suppose that lim f(x) exists and lim f(x) 4 0, that g(x) 4 0 for all 
x—-Cc x—-C 


x €I—{c} and that lim g(x) = 0. 
xe 


1. lim [4] (x) does not exist. 
2. If lim f(x) > 0 and g(x) > 0 for all x € I—{c}, or if lim f(x) <0 and g(x) <0 
XC xc 
for all x © I—{c}, then lim [4] (x) =o. If lim f(x) > 0 and g(x) <0 for 
xe x-C 
all x € 1 —{c}, or if lim f(x) < 0 and g(x) > 0 for all x € I— {c}, then 
a TF 2S 
lim [§] (x) =—~. 
Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 6.3.5. 
(1) Let L = lim f(x). Exercise 3.2.9 (1) implies that lim | f(x)| = |L|. By hypothe- 
Pad Of x—-Cc 
sis L #0, and so |L| > 0. By the Sign-Preserving Property for Limits (Theorem 3.2.4) 
applied to the function |f| there is some M > 0 and some 6; > 0 such that x € J— {c} 
and |x—c| < 6; imply |f(x)| >. 
Suppose that lim [4] (x) exists. Let P= lim [4] (x). Then there is some 6) > 0 such 
xC KE 


that x € I— {c} and |x —c| < & imply | [4] (x) -P| < 1. Because lim g(x) = 0, then 


there is some 63 > 0 such that x € J— {c} and |x —c| < 63 imply |g(x) —0| < PHT 
Let 6 = een Suppose that x € J— {c} and |x—c| < 6. Then |g(x)| = 


|g(x) —0| < Pat We also have leale )-P| < 1. By Lemma 2.3.9 (7) it follows 

that [eat Zs |P| < 1, and hence et a < |P| + 1. We also know that | f(x M >M. 
f(x) 

Therefore al 7 aot : < |P|+1, and hence Prt < |g(x)|, 


We conclude that lim [£ ] (x) does not exist. 


Theorem 6.3.4. Let I C R be an open interval, let c € I and let f,g: I—{c} +R 
be functions. Suppose that lim f(x) exists, that g(x) #0 for all x € I— {c}, that 
x—->C 


lim g(x) = cc or lim g(x) = —c° and that lim, g(x) = oo or lim g(x) = —co, Then 
xc xc xc x->C 
lim [f] (x) exists and lim [4] (x) =0. 


xc 


Proof. Left to the reader in Exercise 6.3.6. 


The analogs of Theorem 6.3.3 and Theorem 6.3.4 for right-hand limits and left- 
hand limits also hold, though we omit the details. 
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We now return to indeterminate forms, and in particular the forms ? and =. In 
Example 6.3.2 we were able to evaluate some limits of the form ¢ , but teat was ‘only 
because we were able to cancel the “bad parts” of the numerator and the Pome 
However, such cee is often not possible in limits of the form ¢ and =, for 
example in the limit lim = , and it would be helpful to have other ways of conmputiig 
such limits. A very useful ae for this purpose is I’ ee s Rule. We will state and 
prove two versions of |’ H6pital’s Rule, the first for ° 9, and the second for =. 

We start with |’ Hépital’s Rule for a. which is somewhat easier to prove than the 
= case. 


Theorem 6.3.5 (l’H6pital’s Rule for os Let I C R be an open interval, let c € I and 
let f,g: I—{c} > R be functions. Suppose that f and g are differentiable, and that 


g' (x) £0 for all x € I — {c}. Suppose that lim f(x) = 0 and lim g(x) = 0. /f lim aa 
xc xe: «Cc ” 
f(x) 


exists, then lim # 0) exists and 


fim £0) = jm 2) 


i : 
g(x) se g!(x) 


Proof. Suppose that lim os 5 exists. Let L = lim rane 


Let € > 0. Then there i is some 5; > 0 such ‘that x €I—{c} and |x —c| < 6; imply 


+ _ a 
fi  1|< 


ae - (a, b). Because g|(q,-) and g|-.,) are differentiable, and because g' (x) 4 0 
for all x € 1—{c} = (a,c)U (c 3), it follows from Exercise 4.4.9 (1) that g|(-) and 
8\(c,p) are injective. 

Because there is at most one x € (a,c) such that g(x) = 0, and at most one x € (c,b) 
such that g(x) = 0, there is some 62 > 0 such that x € J — {c} and |x —c| < 6) imply 
g(x) £0. By Lemma 2.3.7 (2) there is some 63 > 0 such that (c — 63,c + 63) CJ. Let 
5 = min{, 6,53}. 

Suppose that w € J— {c} and |w—c| < 6. Then w € (c—6,c) or w € (c,c +). 
Suppose that w € (c — 6,c); the other case is similar, and we omit the details. By the 
choice of 6 we know that g(w) 4 0. Because lim f(x) =O and lim g(x) = 0, it follows 


from Exercise 3.2.1 and Theorem 3.2.10 that 
f(w) — f(x) _ f(w)-9 _ fw) 


rer g(w)—g(x) g(w)—-0_—_-g(w) 


Hence there is some 7 > 0 such that x € J — {c} and |x—c| < 7 imply 


Choose some e € (w,c) such that |e —c| <n. By Cauchy’s Mean Value Theorem 
(Theorem 4.4.5) there is some q € (w,e) such that [f(e) — f(w)]9’(¢) = [g(e) — 


6.3 Computing Limits to Infinity 335 


g(w)|f'(q). Because g|(q,-) is injective, it follows that g(w) — g(e) 4 0. We know by 
hypothesis that g’(q) 4 0. Hence 


Because g € (w,e) C (c—46,c), then g € I— {c} and |g —c| < 6. Hence va -1| < 
5, and therefore 
f(w) — fle) € 
pose L a 
Then 
f(w) _|fv) — flw)—fle) _ flw)— fle) 
(vt) i 7 Fe s(w)—le) * gw) gle) z 
fiw) _ f(w)—fle)|_, | fv) — fl) ge € 
. Fe e(w) —ate) | | (w) = ale) L ae ad 


We conclude that lim 4 “ exists, and that lim IM), = Fi hime, (3) 
g(x) 8k) xc 8 (x) ° 


The above proof of I’ H6pital’s Rule for f (Theorem 6.3.5) might appear to be 
more complicated than expected, but that is because we have kept the hypotheses of 
the theorem to a minimum. A much shorter proof of |’ H6pital’s Rule for .. though 
with stronger hypotheses, is found in Exercise 6.3.7. This shorter proof, though limited 
in its applicability, provides the closest thing one could call intuitive motivation for 
l’H6pital’s Rule for $; ultimately, what is good about this theorem is its usefulness 
for computing specific limits, not its intuitive appeal. 

There are also variants of |’ Hopital’s Rule for .. where x — c is replaced with one 
of x > ct, orx— c, or x > &, or x  —co; we will not state these variants, because 
there are no changes of substance between them and the version in Theorem 6.3.5, 
but we will use them as needed. 


Example 6.3.6. 
(1) We use I’ H6pital’s Rule for 2 


Because both the numerator and the denominator are continuous functions {by The- 
orem 7.2.7 (2), Theorem 4.2.4, Example 3.3.3 (1) and Theorem 3.3.5), we see that 
lim (e* —1) =e® —1=0 and lim x = 0. It is left to the reader to verify that the 
xs Pa 


=l 


hypotheses of |’ H6pital’s Rule for 3 hold for this example, and hence we see that 


(2) The limit Te sinx — | is often encountered in an introductory calculus course. 


This limit dstiainy aces the hypotheses of |’ H6pital’s Rule for 8, and it can be 
computed very easily by 
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.  sinx . cosx cosd 1 
lim = lim = ==], 
x0 X x0 | 1 1 


And yet, we need to ask whether this use of I’ H6pital’s Rule is legitimate. This limit 
is not computed using |’ H6pital’s Rule in an introductory calculus course, but rather 
it is usually computed via a geometric argument using the unit circle, and the reader 
is urged to try to figure out why before reading on. 

The reason is that in an introductory calculus course, the limit lim == 


x30 
in the proof that sin’ = cos, and so it would not be legitimate to use that aa to compute 
the limit, but that is just what is done when the limit is computed using |’ H6pital’s 
Rule. By contrast, our proof that sin’ = cos, which will be given in Theorem 7.3.12 (1), 
uses a very different definition of sin than the geometric unit circle definition seen in 


calculus courses, and in particular does not use the limit lim sinx = 1. Hence, in our 


x—' 


sinx — ] is used 


context, it is fine to use 1’H6pital’s Rule for this limit. 

(3) l’H6pital’s Rule is so pleasant to use that a problem in its application is that 
sometimes it is used even when the situation does not permit it. It is left to the reader 
to find the flaw with the following “calculation” written by an overly eager calculus 
student: 


. sinx .  COSX . —sinx —sin0O 
“lim = lim = lim = = 0.” 
x0 x + x? x-014+2x x0 2 2 
(The correct value of the limit is 1.) ©) 


We now turn to |’H6pital’s Rule for =. Here we make use of Type 2 limits to 
infinity. The proof for the = case is a bit trickier than for the 7 case, and in contrast to 
the 5 case, where a shorter proof may be obtained by strengthening the hypotheses, 
there is no such easy route in the = case. 


Theorem 6.3.7 (I’H6pital’s Rule for =). Let J C R be an open interval, let c € I 

and let f,g: I— {c} > R be functions. Suppose that f and g are differentiable, and 

that g'(x) £0 for all x € 1 —{c}. Suppose that lim f(x) = or lim f(x) = —2%, 
x=c~ P ea: OO 


that Jim 1 f (x) = cor lim, f(x) = —99, that in 1 g(x) = or Jim g(x) = —oo and 
that ‘im 1 a(x )=ccor lim g(x) = —0, iflim £ 7 (a) exists, then lim a) exists and 
x—ct x—ct al(x x) x—-c g(x) 
7 
ia Ot 
eg) ee gl(x)” 
Proof. Suppose that lim oe exists. Let L = lim f mae 
We will prove that lim nea exists and lim i) = L. A similar argument shows 
A= 67 2 xc 
that iim a exists and lim i. = L, and we omit the details. It will then follow 
xc , x—-C 
f(x) fx) _ 
from Lemma 3.2.17 that lim 4 ae) exists and equals lim ay L. 
Let € > 0. By Lemma 3.2.17 we see that lim (9 exists and lim ie = L. Then 
xco~ x—c" 
fx) 


there is some 6; > 0 such that x € J— {c} and c— 6; <x <c imply 


a(x) 
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Let J = (a,b). Choose some u € (a,c) such that |u—c| < 6. Because eee 1 f(x) = 
oo or oe 1 f (x) = —0, there is some 6) > 0 such that x € J— {c} amd 8) <x<c 


amply TF )| > |f(w)|. In particular, we see that x € J— {c} and c— 6) <x <c imply 
f(x) £0 and f(x) ¥ f(u). Similarly, because ian 1 g(x) = or linn 1 g(x) = —0, it 


x 


follows that there is some 63 > 0 such that x € oe {ah and x € c— by <x <c imply 


g(x) #0 and g(x) # g(u). 
Let 7 = min{&, 63, S*}. Then u <c— 1. Let S: (c—1,c) — R be defined by 


g(u) 
( i= : g(x) 
_ fy) 
F(x) 
for all x € (c—1,c). Because lim f(x) =o or lim f(x) = —oe, and lim g(x) = 
> od ON coe x—-Cc~ 
or bate 1 g(x ) = —co, then it follows from the one-sided analogs of Theorem 6.3.4, 


Exercise 3.2.1 and Theorem 3.2.10 that Pat S(x) = 1. Hence there is some 64 > 0 
such that x € J—{c} andc—Oy<x< ae |S(x) —1] < TEE: 

Let 6 = min{6y, 63, 64, S"}. Suppose that w € J — {c} and c—6 <w<c. Then 
u<w<c, and g(w) 4 g(u), and |S(w) — 1] < Ee: Because g(w) ¥ g(u), then 


S(w) 40. 
By Cauchy’s Mean Value Theorem (Theorem 4.4.5) there is some q € (u,w) such 


that [f(w) — f(u)]g’(4) = [g(w) — g(u)]f"(@). We know by hypothesis that g’(q) 4 0. 
Hence, with a bit of rearranging, we see that 


fw) Fwy _ £'@) 
g(w) 1— — g'(q)’ 
and therefore Fw) r( 
TEL 2 SE shy 
aw) aye 


Because q € (u,w), it follows that c— 6) <u<q<w<c.Henceg € a {c} and 


= fq) FAg)) _ 
|q—c| < 6,. Therefore JG -2 fee a IL| < 


a and therefore | £2 <|L|4+5. 
Then 
fw) | _|£@ 6, ~|£OD oy £@ , f(a) 
pcs zl FIC : ma g(a) 8'(@) i 
f'(Q) £@| \f@ 

S| rol (a) i 
—|FOD| ise) — f'(@) 
= rol Sw) — + lg) i 
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nae € E 
< (lu ae — 

There are also variants of l’H6pital’s Rule for = (Theorem 6.3.7), where x — c is 
replaced with one of x — c', orx > c_, or x — &, or x — —oco; again, we will not 
state these variants, but will use them as needed. 


Example 6.3.8. 


(1) We want to evaluate the limit lim xInx. We cannot evaluate this limit as 


x—' 


[ lim, x]- | lim Inx], because the one-sided analog of Theorem 3.2.10 (4) holds only 
x0 x0 


when each of these two limits exists, and yet lim Inx = —c9, which will be proved in 


x0 
Exercise 7.2.5. The limit lim xInx therefore has the form “0 - °°,” which is another 
x 
type of indeterminate form, as seen in Exercise 6.3.3. However, we can rewrite this 


limit as lim, Inx which has the form =, and using the one-sided variant of I’ Hépital’s 


x-0T X 
Rule for = (Theorem 6.3.7) we see that 


x 


mae This limit has the form =, and 
+x" 
the x — co variant of |’ Hdépital’s Rule for = is applicable. However, if we try to use 


1’ H6pital’s Rule we obtain 


(2) We want to evaluate the limit lim 
x— oo 


hi Xx i i i (5+27)2 
1m = lim = hm = hm 
x00 4/5 4 2 xX—00 (5-+x2)2 xX—00 2x(5-+22)73 X—00 2x 
2 2-2 
ge 
X00 X—+00 (5 +x2)2 


Having used |’ H6pital’s Rule twice, we returned to the original limit; clearly using 
l’H6pital’s Rule again would simply repeat the process. Hence, although I’ Hépital’s 
Rule is applicable in this case, it is not actually helpful. Fortunately, there is an 
alternative (and much simpler) way to evaluate this limit, which is 


x x 1 1 
lim = lim = = lim = lim =1 
X00 4 /5 + x2 X—00 = ra x xX—00 es re 1 X—00 4 /O + 1 
x x x 


The final result in this section, though not about indeterminate forms, is about a 
particular situation involving Type 2 limits to infinity. This theorem, which relates 
such limits to the derivatives of inverse functions, might appear somewhat technical, 
but it will be useful when we prove that the sine and cosine functions are differentiable 
in Section 7.3. 


Theorem 6.3.9. Let (a,b] C R be a non-degenerate half-open interval, and let 

f: (a,b] = R be a function. Suppose that f is continuous on (a,b] and differentiable 

on (a,b). Suppose that f'(x) > 0 for all x € (a,b), and that lim f’(x) = 0%. Then the 
x—b- 
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function f~': f((a,b]) — (a,b] is differentiable, and [f~']'(f(b)) = 0, where this 


derivative is one-sided. 


Proof. By Theorem 4.5.2 (2) we know that f is strictly increasing. It follows from 
Exercise 4.6.6 (1) that f((a,5]) is an interval of the form either (glb f((a,b]), f(d)] 
or (—co, f(b)]. Because f is strictly increasing, we know ff is injective, and hence 
f((a,b)) = f((a,b]) — {f(b)}, which means that f((a,b)) is an interval of the form 
either (glb f((a,b)), f(b)) or (2, f(b)). 

By Theorem 4.6.4 we know that f~! is differentiable on f((a,b)), and therefore 
all that needs to be proved is that f—! is differentiable at f(b) and [f~']'(f(b)) = 0. 

Let F: (a,b) — R be defined by 


x—b 
f(x) — f(8) 


for all x € (a,b). Because f is injective, then f(x) A f(b) for all x € (a,b), and 
therefore F is well-defined. We now show that lim F(z) = 0. Let € > 0. Because 
zob- 


F(x) = 


lim f’(x) = 9, there is some 6 > 0 such that x € (a,b) and b— 6 <x <b imply 
xX—b- 


f'(x) > 4. Suppose that z € (a,b) and b— 6 <z <b. By the Mean Value Theorem 
(Theorem 4.4.4) there is some c € (z,b) such that 


Hence f(b) — f(z) = f'(c)(b 


—z). Because c € (z,b), it follows that c € (a,b) and 
that b—6 <c<b.Hence f’(c) > 


z and therefore uO) < €. We now compute 
z—b 1 1 1 
F 0| = = 
Fa = —F5) TAF |= |F)|~ FO <* 


Hence ae 1 F(z) =0. 


Given that f is strictly increasing and continuous, Exercise 4.6.6 (2) implies 
that f—': f((a,b]) — (a,b] is continuous at f(b). Hence, by the one-sided analog of 
Lemma 3.3.2, we see that f—|(y) exists and f ‘Q=f (U@ j=. 

yr f(b) yo f(b)~ 


The one-sided analog of Theorem 3.2.12 now implies that (F o f—!)(y) exists 
yo f(b) 


and lim (Fof—!)(y)= lim F(z). Therefore 
yo f(b)- zob- 


im LOL) yn FO) 
yf(b) y—f(d) yt) F(F-"(y)) — F(d) 
zi (Fof-!)(y) = lim FG@) =0 


= 
o 
—| 
io) 
= 
mn 
[ ome 
> 
pe) 
or 
a 
il 
n 
Qa 
=e 
O° 
s 
oO 
5 
c. 
pe} 
io” 
= 
a) 
pee) 
te 
Ss 
ae 9 
rad 
pee) 
5 
Q 
SS 
— 
Ss 
P ie 6 
>a 
<< 
= 
lI 
oO 
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Reflections 


In the larger scheme of real analysis, 1’ H6pital’s Rule is not a particularly important 
result. It is, nonetheless, included in this text because it is well-liked by students in 
calculus courses, and it is quite useful in various calculations. For example, we will 
see an application of |’ Hopital’s Rule to the number e in Example 8.4.3. Moreover, 
the proofs of both the 7 and = cases, which are nice applications of Cauchy’s Mean 
Value Theorem (Theorem 4.4.5), are much more complicated than might at first be 
expected—a good sign that something of interest is occurring. 


Exercises 


Exercise 6.3.1. [Used in Exercise 6.4.13.] Let p € (0,0). 
(1) Prove that lim ze = 0. Intuitively, this limit says that the exponential function 
xX— 00 


grows faster than any polynomial as x goes to infinity. 
Inx 


(2) Prove that lim +; = 0. Intuitively, this limit says that the logarithm grows 
slower than any polynomial as x goes to infinity. 
Exercise 6.3.2. [Used in Section 6.2 and Section 6.3.] Find an example of functions 
f,g,k: (0,0¢) — R such that lim [f — g](x) and lim [f — A](x) and lim [| f — k](x) have 
the form co — oe, that lim [f “ a(x) = 0, that lim [f — h\(x) isa ee number and 
that lim [f —k](x) =o. = 
Exercise 6.3.3. [Used in Example 6.3.8.] In Example 6.3.8 (1) we saw an example of 


a limit that has the form 0-°°, where the value of the limit was 0. Find an example 
of functions f,g,h: (0,e¢) — R such that lim [fg](x) and lim [fh|(x) have the form 
Pes att 


0 -co, that lim [fg](x) is a positive number and that lim [fh] (x) = 9. 


fe) _ 


Exercise 6.3.4. Find an example of functions f,g: IR — {0} — R such that lim re 


‘sey 
O, but that Dee L i does not exist. An informal argument is sufficient. 
x-uUe 


Exercise 6.3.5. [Used in Theorem 6.3.3.] Prove Theorem 6.3.3 (2). 
Exercise 6.3.6. [Used in Theorem 6.3.4.] Prove Theorem 6.3.4. 


Exercise 6.3.7. [Used in Section 6.3.] Consider the following “proof” of 1’ Hépital’s 
Rule for 8: 


gn tO) i SOHO _ SO-Se 
I =1 =H 
roe g(x) xe g(x) 0 me g(x) — g(c) 
f@)-f() 
ae ee Oe FO) 
= PS@=i@) ge) oe oe 


This proof is much simpler than the proof we gave for Theorem 6.3.5, but that is 
because this shorter proof requires stronger hypotheses. Restate |’ H6pital’s Rule for 
with the hypotheses needed to make this shorter proof work. 
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Exercise 6.3.8. [Used in Section 7.2.] 
(1) The limit lim, x* has the form 0°. Prove that lim x* = 1. 
x0 x 


(2) Find an example of a limit of the form 0° such that the limit has value 0. 


Exercise 6.3.9. [Used in Exercise 6.3.10 and Example 8.4.3.] Let f: R— {0} — R be 
a function. 
(1) Prove that lim f (+) exists if and only if lim f(r) exists, and if these limits 
x0 ° tyes 
exist then they are equal. 
(2) Prove that lim f (4) exists if and only if lim f(t) exists, and if these limits 
x0 t+— 00 
exist then they are equal. 
(3) Prove that lim f (4) exists if and only if lim f(t) and lim f(t) exist and are 
x—' t—00 t—-—0o 


equal, and if these three limits exist then they are equal. 


Exercise 6.3.10. [Used in Example 10.4.11.] Let p: IR — R be a polynomial function. 
(1) Prove that 


[Use Exercise 6.3.9 (3).] 


Prove that f is differentiable, and that there is a polynomial function r: R— R 
such that 


1 
-4+ 
Oe ev, ifx40 
0, ifx =0. 


Prove that h is infinitely differentiable, and that h)(0) = 0 for alln EN. 


6.4 Improper Integrals 


In our treatment of the Riemann integral in Chapter 5, we stressed that an integral 
of the form rh f(x) dx was for functions defined on closed bounded intervals. We 
also saw that integrable functions are bounded. However, in various applications 
of integration, for example as Laplace transforms and continuous probability, it is 


342 6 Limits to Infinity 


necessary to look at integrals where either the domain is not bounded (which is often 
when the function has a horizontal asymptote), or the domain is a half-open interval or 
an open bounded interval (which is often when the function has a vertical asymptote). 
Such integrals cannot be evaluated directly as Riemann integrals, but it turns out 
that they can be evaluated as limits of such integrals. These two types of limits of 
integrals are called “improper integrals.” There are two types of improper integrals, 
corresponding roughly to the two types of limits to infinity that we saw in Section 6.2. 

The main idea in the evaluation of improper integrals is as follows. Suppose that 
we have a function with domain |a,b). We can think of approximating this interval 
by closed intervals of the form [a,t], where t € (a,b), and where t is thought of as 
getting closer and closer to b. We can then define define the improper integral of the 
function on [a,b) by evaluating the ordinary integral of the function on each closed 
integral [a,t], and then taking the limit as ¢ goes to b, if the limit exists. To be sure 
that this approach is a good one, however, we should ask whether such a limiting 
process works when our function is in fact defined on a closed bounded interval. That 
is, let [a,b] C R be a non-degenerate closed bounded interval, and let f: [a,b] — R 
be a function. Suppose that f is integrable. By Theorem 5.5.6 we know that f [at] is 


integrable for each t € (a,b). Is it true that lim f! f(x)dx= f?f@) dx? Fortunately, 
t—b— 


it was proved in Exercise 5.5.6 that the answer is yes. Hence, our idea for defining 
improper integrals is consistent with the definition of integrals for closed bounded 
intervals. A similar limiting process is used for functions with domains that are not 
bounded. 

We start with the following definition, which is needed for both types of improper 
integrals. 


Definition 6.4.1. Let 7 C R be an interval, and let f: J — R be a function. The 
function f is locally integrable if /| ia,b] 18 integrable for every non-degenerate closed 
bounded interval [a,b] C I. A 


It follows from Theorem 5.4.11 that any continuous function is locally integrable. 
We now turn to the first type of improper integral, called a Type 1 improper 
integral, and corresponding to Type | limits to infinity. 


Definition 6.4.2. 


1. Let [a,0o) C R be a closed unbounded interval, and let f: [a,«o) — R bea 
function. Suppose that f is locally integrable. The function f is improperly 
integrable if lim Ji f(x) dx exists. If this limit exists, it is denoted [ f(x) dx, 


and it is called the improper integral of f. If f is improperly integrable, we 
also say that the improper integral [*~ f(x) dx is convergent; otherwise we 
say that the improper integral [°* f(x) dx is divergent. 

2. Let (—oo,b] C R be a closed unbounded interval, and let g: (—co,b] + R be a 
function. Suppose that g is locally integrable. The function g is improperly in- 
tegrable if lim Jf? g(x) dx exists. If this limit exists, it is denoted [?., g(x) dx, 


and it is called the improper integral of g. If g is improperly integrable, we 
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also say that the improper integral [?., g(x) dx is convergent: otherwise we 
say that the improper integral [ > g(x) dx is divergent. A 


Example 6.4.3. 


(1) Let f: [1,cc) — R be defined by f(x) = + for all x € [1,00). Then f is contin- 
uous, and therefore it is locally integrable. Hence we can compute 


| ig 1 1 
lim | —dx=lim |-;| = lim ( ) ( )| 200-1, 
too J] xX t—00 xy t—00 t 1 


It follows that the improper integral [;° - dx is convergent and [;* + dx=1. 
(2) Let g: [1,cc) — R be defined by g(x) = a for all x € [1,cc). Then g is 


x 
continuous, and therefore it is locally integrable. Hence we can compute 


’ td _— 
lim a — lim [2Vz], = lim [2vi-2v1] = 0, 


too J] 


where the final equality follows from Exercise 6.2.15 (3) and Theorem 6.2.8. Hence 
ele 


the improper integral [ i yg ax is divergent. ?) 

Type | improper integrals behave similarly to regular integrals in some ways, but 
not all. For example, there are analogs for Type | improper integrals of Theorem 5.3.1 
(1) (2) (3) and Theorem 5.3.2 (1) (2); see Exercise 6.4.2 for some of these. On the 
other hand, there are clearly no Type | improper integral analogs of Theorem 5.3.1 (4) 
and Theorem 5.3.2 (3). Moreover, whereas an integrable function must be bounded by 
Theorem 5.3.3, a function can be Type | improperly integrable and yet not bounded, 
as the reader is asked to show in Exercise 6.4.3. 

Definition 6.4.2 deals with Type 1 improper integrals of functions defined on 
closed unbounded intervals. We now turn to Type | improper integrals for func- 
tions defined on all R. That is, we want to evaluate improper integrals of the form 
J. f(x) dx. The key observation is that the limits to negative infinity and to infinity 
must be taken separately, to avoid cancellation due to coincidental symmetry; we will 
see an example of such cancellation in Example 6.4.6 (2). The simplest way to to 
treat the limits to negative infinity and to infinity separately is to break up the integral 
J”. f(x) dx into a sum of the form [°,, f(x)dx+ J™ f(x) dx. The question then arises 
as to whether the choice of c makes a difference, though fortunately the following 
lemma shows that it does not. 


Lemma 6.4.4. Let f: R — R be a function, and let c,d € R. Suppose that f is 
locally integrable. Then J‘. f(x) dx and J” f(x) dx are both convergent if and only if 


ian f(x)dx and {7° f(x) dx are both convergent, and if these improper integrals are 
convergent then 


[pears [rears [" peyacs [ rooae 
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Proof. Suppose that {°,, f(x) dx and f° f(x) dx are convergent. The other implication 


is similar, and we omit the details. 
Let s,t € R. By Corollary 5.5.9 and Definition 5.5.8 we see that 


| "joa [ seoaxt | " fla) dx 


and 


[reoac= [react [ sears — [' peoars fae 


We now use Theorem 6.2.4 and its analog for limits to negative infinity, together with 
the fact that the limit to infinity or to negative infinity of a constant function is that 
constant, to deduce that 


s——oo 


lim ["resyax= tim { [reper foyer} = [roars [roan 


and 
tim, F0x) dx = fim {= [reyes [separ =— ["poyart [soa 


Hence [“,, f(x) dx and Ja f(x) dx are convergent, and 


d foe} C co 
/ f(x)ax+ | faax= [ fear [ f (x) dx. 

Lemma 6.4.4 allows us to make the following definition. 
Definition 6.4.5. Let f: IR — R be a function. Suppose that f is locally integrable. 
The function f is improperly integrable if for any c € R the improper integrals 
Joo f(x) dx and f f(x) dx are convergent. If both of these improper integrals are 
convergent, the sum f<,, f(x)dx+ J- f(x) dx is denoted [™,, f(x) dx, and it is called 
the improper integral of f. If f is improperly integrable, we also say that the 


improper integral [™., f(x) dx is convergent; otherwise we say that the improper 
integral [“. f(x) dx is divergent. A 


Example 6.4.6. 
(1) Let f: R— R be defined by f(x) = Ixje“™ for all x € R. Then f is continuous, 


and therefore it is locally integrable. Hence we can compute 


0 2 o 2 0 2 t 2 
/ |x|je* dx+ | |xje* dx= lim J (—x)e™“ dx tim | xe dx 
—oo 0 So Js too Jo) 


er : eo ‘ 
=— lim + lim 
sS——oo 2 t—00 2 
Ss 0 
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eo” —s 


2 2 


1 1 
=—|-=+0 O+=|=1 
Ear+ [ora] 
where the equality before last will be proved in Exercise 7.2.8 (2). It follows that the 
improper integral |~,, |xlje* dx is convergent and |, |xje* dx = 1. 
(2) We want to evaluate the improper integral {™,xdx. A common mistake in 


evaluating such integrals is to try to take the limits to negative infinity and to infinity 
simultaneously, for example by the calculation 


ss ; 248 a ¢ae 
/ xdx= lim [ xdx = lim He i || ag: 
—oo too }_+ t—co 2 -t t—+00 ) 2 t—co 


It would be a mistake to deduce from the above calculation that the improper integral 
f xdx is convergent; the limit in the above calculation exists only because of the 
symmetry of the function, not because of actual convergence of the improper integral. 
If we evaluate the integral properly, we see that 


0 co 0 t 321° 1" 
/ xdx+ | xdx = lim / xdx-+lim | xdx = lim B + lim B 
65 0 S——oo Jy to Jo) S——oo 2 * t—oo 2 0 


=— lim 
$5205 


_,) 2 
Seo ee 
It can be verified that 
_) 2 
lim ——=-—co and lim ~=0; 
sco 2 too 2 


we omit the details. In particular, neither of these limits exists. Hence each of the im- 
proper integrals [ ‘0 xdx and Jo xdx is divergent, and therefore [™,,xdx is divergent. 
It is important to note that —oo and © do not “cancel each other out”; observe that 
there is no mention of c+ (—ce) in Theorem 6.2.8. o) 


We now turn to the second type of improper integral, called a Type 2 improper 
integral. For this type of improper integral, rather than looking at functions defined on 
closed unbounded intervals, we now look at functions defined on open or half-open 
bounded intervals. The evaluation of Type | improper integrals involves Type 1 limits 
to infinity, and so there is a nice correspondence between these two uses of the term 
“Type 1.” The evaluation of Type 2 improper integrals also involves taking limits, 
though not necessarily limits to infinity, so the correspondence between the uses of 
the term “Type 2” for improper integrals and limits to infinity is not immediately 
evident, though we will see that there is more of a correspondence than is at first 
apparent after we prove Lemma 6.4.9 below. 
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Definition 6.4.7. 


1. Let [a,b) C R be a non-degenerate half-open interval, and let f: [a,b) — R be 
a function. Suppose that f is locally integrable. The function f is improperly 
integrable if lim’ f(x) dx exists. If this limit exists, it is denoted f'? f(x) dx, 

t—b- 


and it is called the improper integral of f. If f is improperly integrable, we 
also say that the improper integral We f(x) dx is convergent; otherwise we say 
that the improper integral {? f(x) dx is divergent. 

2. Let (a,b] C R be a non-degenerate half-open interval, and let f: [a,b) — R be 
a function. Suppose that f is locally integrable. The function f is a. 
integrable if lim, i? f(x) dx exists. If this limit exists, it is denoted [? f(x) 

sa 


and it is called the improper integral of f. If f is improperly integrable, we 
also say that the improper integral [ ig f(x) dx is convergent; otherwise we say 


that the improper integral i? f(x) dx is divergent. A 
Example 6.4.8. 
(1) Let f: (0, 1] — R be defined by f(x) = a for all x € (0, 1]. Then f is contin- 


uous, and therefore it is locally integrable. Hence we can compute 


1] 1 
he) a= ta | eae [2x2] 
s0t Js Vx - s—0t : — s—0T Ss 


Jim, ava =2-2-0=2, 


I 


where the penultimate equality follows from Exercise 7.2.11 (3) and the one-sided 
analog of Theorem 3.2.10. It follows that the improper integral iy a dx is convergent 


and fy dx =2. 


(2) Let g: (0,1] — R be defined by g(x) = + for all x € (0, 1]. Then g is continu- 
ous, and therefore it is locally integrable. Hence we can compute 


where the final equality follows from Example 6.2.7 (2) and Theorem 6.2.8 (1). Hence 
the improper integral i + dx is divergent. 0) 


For Type | improper integrals, where the function is defined on intervals of the 
form [a,e°), or (—c,b] or (—ce,°e), it is evident that the function cannot be integrated 
in the ordinary (non-improper) way, because the function is not defined on a closed 
bounded interval. On the other hand, consider the function f: [0, 1) — R defined by 
f(x) =x? for all x € [0, 1). In principle, if we wanted to integrate this function, we 
would need to do so as an improper integral, because the function is not defined as 
written on a closed bounded interval. Of course, in practice it would be very silly to 
evaluate the integral G x° dx as an improper integral, because the function f can be 
extended to a continuous, and hence integrable, function g: [0,1] — R defined by 
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g(x) = x for all x € [0,1]. The functions for which Type 2 improper integrals are 
really intended are those defined on open or half-open intervals for which the function 
cannot be extended to an integrable function at the endpoints of the interval. As we see 
in the following lemma, the real use of Type 2 improper integrals is when the function 
is not bounded, because bounded functions can always be dealt with by extending 
the function to a closed bounded interval. The choice of such extension does not 
matter, because, as was proved in Exercise 5.3.3 (3), if two functions defined on a 
non-degenerate closed bounded interval differ at only finitely many points, then one 
is integrable if and only if the other is, and if they are integrable then their integrals 
are equal. 

In the following lemma, as well as in the other results we will subsequently prove 
for Type 2 improper integrals, we treat functions defined on intervals of the form [a,b). 
The analogous results hold for intervals of the form (a,b]; for the sake of brevity we 
will not state such results, though we will use them as needed. 


Lemma 6.4.9. Let [a,b) C R be a non-degenerate half-open interval, and let 
f: [a,b) — R be a function. Suppose that f is locally integrable and bounded. Then 
f is improperly integrable if and only if any extension g: {a,b| — R of f is inte- 
grable. If f is improperly integrable, then ie f(x)dx= ? g(x) dx for any extension 
g: [a,b] = Rof f. 


Proof. First, suppose that f is improperly integrable. Let g: [a,b] — R be an exten- 
sion of f. Because f is bounded, there is some M € R such that | f(x)| < M for all 
x € [a,b). We may assume that M > 0. Let N = max{M,|g(b)|}. Then N > 0, and 
|g(x)| <N for all x € [a,b]. 

Let € > 0. Let t € (a,b). Suppose that |t—b| < ae Because f is locally integrable, 
we know that Silas] =f (a,] 18 integrable. By Theorem 5.4.7 (c) there is some partition 
P = {xo,X1,---,Xn} of [a,t] such that U(g|ja4,P) —L(8ljayj,P) < 4- Observe that 
Xn = t. Let X41 = b, and let Q = {x0,x1,...,X%n41}. Then Q is a partition of [a,d]. 

Because |g(x)| < N for all x € [a,b], Exercise 5.4.9 (4) implies that M,41(g) — 
Mn+1(g) < 2N. Then 


n+1 
= be [Mi(g) —mi(g)] (xi — xi-1) 


= Yi) —m;(g)](xi —xi-1) + [Mn+i(g) — 11n41(g)] (n41 — Xn) 


= [U (8ltas)-P) — L(8lta.)>P)] + Mn+1 (8) —n+1(8)](b 2) 


E E 
—+t2N.—-=e. 
Sg ay € 


Therefore g is integrable by Theorem 5.4.7 (c). We can now apply Exercise 5.5.6 to g, 
and we deduce that [? g(x)dx = lim fi‘ g(x)dx = lim |" f(x)dx= J? f(x)dx. 
tb tb- 


Second, let h: [a,b] — R be an extension of f. Suppose that / is integrable. 
Because h,.;) = f|jax) for all t € (a,b), we use Exercise 5.5.6 again to see that 
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lim fi f(x)dx= lim f'h(x)dx= fe h(x) dx. Hence f is improperly integrable and 
tb— t—b— 

Je Flax = J h(x) dx 


Because of Lemma 6.4.9, if one encounters an integral of the form fe f (x) dx, and 
if the function f is bounded, then the integral is an ordinary one; if the function is not 
bounded then the integral is improper. The most commonly encountered functions 
that are not bounded, but for which the domains are bounded intervals, are functions 
with vertical asymptotes, and hence Type 2 improper integrals are often associated in 
calculus courses with the notion of vertical asymptotes. Moreover, because Type 2 
improper integrals most commonly occur in practice in the case of vertical asymptotes, 
there is a partial correspondence between the use of the term “Type 2” for improper 
integrals and the use of that term for limits to infinity. 

As was the case for Type | improper integrals, we note that Type 2 improper 
integrals satisfy analogs of some, though not all, properties of ordinary integrals; see 
Exercise 6.4.7 for analogs of Theorem 5.3.1 (1) (3) and Theorem 5.3.2 (2). 

The simplest use of Type 2 improper integrals is when a function is defined 
everywhere on a non-degenerate closed bounded interval except one of the endpoints. 
However, it is also possible to look at more complicated situations, for example when 
a function is undefined at both endpoints of a non-degenerate closed bounded interval, 
or is undefined at a point (or finitely many points) in the interior of such an interval. 
In all cases, we break up the domain of the function into finitely many subintervals 
such that Definition 6.4.7 can be applied to each. The function is then considered to 
be improperly integrable on the whole interval if it is improperly integrable on each 
subinterval, and if the latter holds, then the improper integral on the whole interval is 
the sum of the improper integrals on the subintervals. 


Example 6.4.10. In Example 5.6.6 (2) we saw that the integral bie 1 + dx cannot 
be evaluated directly by the Fundamental Theorem of Calculus Version II (Theo- 
rem 5.6.4). The reason that the Fundamental Theorem of Calculus Version II is not 
applicable to this integral is that the function is defined only on [—1,0) U (0, I], and it 
has vertical asymptotes at x = 0, one on each side. 

The correct way to evaluate this integral is to break up the domain of the function 
into the two intervals [—1,0) and (0, 1], and then to evaluate each of the improper 
integrals fia a dx and G 3 dx. The entire improper integral |", 5 dx will then be 
convergent if and only if both of the two improper integrals tint 5 dx and ie 5 dx 
are convergent. We saw in Example 6.4.8 (2) that 4, 3 dx is divergent, and hence 
aa 3 dx is divergent. © 


There are some situations in which it is not possible to prove directly that an 
improper integral is convergent, and to compute its value, but where it is nonetheless 
possible to prove indirectly that the improper integral is convergent, even though an 
exact numerical value for the improper integral cannot be found. For the reader who 
is familiar with convergence tests for series, the following theorem about improper 
integrals should look very familiar, being the analog of the Comparison Test for 
series (which we will see in Section 9.3). We state the following theorem for Type 2 
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improper integrals, because we will need it in Section 7.4. The reader is asked to state 
and prove the analogous result for Type 1 improper integrals in Exercise 6.4.10. 


Theorem 6.4.11 (Comparison Test for Type 2 Improper Integrals). Let (a,b) CR 
be anon-degenerate half-open interval, and let f ,g: [a,b) — R be functions. Suppose 
that f and g are locally integrable, and that there is some 6 > 0 such that x € |a,b) and 
|x —b| < 6 imply 0 < f(x) < g(x). If g is improperly integrable, then f is improperly 
integrable. 


Proof. Suppose that g is improperly integrable. There are two cases, depending upon 
whether a << b—6 <aora>b—6. Suppose that a < b— 6; the other case is similar, 
and we omit the details. Let c= b—6.Thena<c <b. 

Ift € (c,b), then f|j..4) and g|,.,) are integrable, and by Theorem 5.3.2 we know 
that 0 < f" f(x) dx < J! g(x) dx. We also know that f° f(x)dx =0 and [* g(x) dx =0. 
Let F,G: [c,b) — R be defined by 


Fa)= [ seas and Gi) = [ tax 


for allt € [c,b). Then 0 < F(t) < G(t) for all t € [c,b). By Exercise 5.6.4 we see that 
F is increasing. 
Because g is improperly integrable, it follows from Exercise 6.4.6 that 8lic,b) 
is improperly integrable, and hence lim G(t) = lim " g(x) dx exists. We can now 
tob- t>b~ 
apply Exercise 4.5.10 (2) to deduce that lim F(t) exists and lim F(t) < lim G(r). 
tb— tb— tb— 


Hence lim f f(x) dx exists, which means that f |[c,b) is improperly integrable. Using 
t—b- 


Exercise 6.4.6 again we conclude that f is improperly integrable. 


In our discussion of in Sections 7.3 and 7.4, we will need to know that Integration 
by Parts and Integration by Substitution both work for Type 2 improper integrals, 
as we will now see. The reader who will skip those sections can also safely skip 
the following two theorems and proofs. The reader is asked to state and prove the 
analogous results for Type | improper integrals in Exercise 6.4.11 and Exercise 6.4.12. 

In the statement of the following theorem we should properly write “(fo g|ja,))- 
g' | (ap) tather than “(fog)-g’,” but for the sake of readability we abuse notation and 
write the latter. 


Theorem 6.4.12 (Integration by Substitution for Type 2 Improper Integrals). Let 
[a,b], [c,d] C R be non-degenerate closed bounded intervals, and let g: |a,b] — [c,d] 
and f : [c,d) — R be functions. Suppose that f is continuous, that g is bijective and 
differentiable and that g' is integrable. Then (f 0 g) - g' is improperly integrable if and 
only if f is improperly integrable, and if they are improperly integrable then 


[reensyar= [ropa 


Proof. Because g is differentiable, then it is continuous by the closed interval analog 
of Theorem 4.2.4. It then follows from Exercise 4.5.11 that g is strictly monotone. 
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Suppose that g is strictly increasing; the other case is similar, and we omit the details. 
It follows from Exercise 4.6.3 (1) that [g(a),g(b)] = g({a,b]) = [c,d], and hence that 
g(a) =c and g(b) =d. Therefore g maps [a,b) bijectively onto [c,d). 

Let t € (a,b). Because g is strictly increasing, then so is g|jq,;. By Exer- 
cise 4.2.3 (5) we know that 8\ a] is differentiable, and hence it is continuous. It 
follows from Exercise 4.6.3 (1) that g({a,t]) = [g(a),9(t)| = [c,g(t)]. Because g’ is 
integrable, if follows from Theorem 5.5.6 that g’ | (a, 18 integrable. 

Because f is continuous, then f lee (| 18 continuous by Exercise 3.3.2 (2). We 
can now apply Integration by Subeaaicn for Definite Integrals (Theorem 5.7.4) to 
F \fee(r)| and gl jar), and we deduce that [(f 0g) - g’]||a,4] is integrable and 


[sees ac= [" poorer 


It now follows that lim [’ f(g(x))g’(x) dx exists if and only if lim pg f(x) dx 
t—b- t—b- 


exists, and if these limits exist then 


tim [F090 e(e)de= tim f(x) 


By definition lim f' f(g(x))g'(x) dx exists if and only if (fog) -g’ is improperly 
t—b— 

integrable, and if this limit exists then it equals - f(g(x))g9'(x) dx. We claim that 

lim [8° f(x) dx exists if and only if lim” f(x) dx exists, and if these limits exist 

t—b~ w—d— 

then they are equal. We will prove this claim shortly, but assuming that the claim 


is true, we then observe that by definition ne J’ f(x) dx exists if and only if f is 


improperly integrable, and if this limit exists then it equals ic f(x) dx. If we put all 
the above observations together, it follows that (fo g) -g’ is improperly integrable if 
and only if f is improperly integrable, and if they are improperly integrable then 


[reengear= [popax 


It remains to prove that lim pg f(x) dx exists if and only if lim f” f(x) dx 
t—b- w—d- 

exists, and if these limits exist then they are equal. First, suppose that lim" f(x) dx 
w—d- 


exists. Let F: [c,d) — R be defined by F(z) = f* f(x) dx for all z € [c,d), which 
makes sense because f is locally integrable. Our hypothesis can then be restated by 
saying that us F(w) exists. Because g is continuous, then by the one-sided analog 
of Lemma 33.2 we know that lim g(t) = g(b) = d. We now abuse notation and 
t—b— 
think of g as a function [a,b) — [c,d), and we can then apply the one-sided analog of 
Theorem 3.2.12 to deduce that lim (F og)(t) exists and lim (Fog)(t) = lim F(w), 
t—b— t—b- w—d— 
which means that lim fF F®) dx = lim F(g(t)) exists and lim f8 fx) dx = 
tob— tob— tb— 


lim 2" f (x)dx 
w—d— 
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Second, suppose that lim pg f(x) dx exists. Observe that g|(q.p) is strictly in- 
t—b— : 
creasing and is continuous at b. We can therefore apply Exercise 4.6.6 (2) to deduce 
that g~!: (c,d] — (a,b] is continuous at d = g(b). By the one-sided analog of Lem- 
ma 3.3.2 it follows that lim g~!(w) = g~!(d) = b. The same type of argument 
pe 


w— 
used in the previous paragraph can then be used to show that lim "" f(x) dx exists 
w—d— 


and lim f" f(x)dx= lim [% 0) ¢(x)dx= lim [8 f(x) dx, and we omit the 
w—d- w—d- tob- 
details. 


Theorem 6.4.13 (Integration by Parts for Type 2 Improper Integrals). Let 
[a,b] C R be a non-degenerate closed bounded interval, and let f,g: [a,b] += R 
be functions. Suppose that f and g are continuous on |a,b], that f and g are dif- 
ferentiable on |a,b) and that f' and g' are locally integrable on {a,b). Then f'g is 
improperly integrable if and only if fg' is improperly integrable, and if they are 
improperly integrable then 


b b 
[ #038!) dx=[fO)s(6)- F@sa)]— fF’ @)gla) ax 


Proof. Because f and g are continuous on [a,b], then by the one-sided analog of 
Lemma 3.3.2 we know that pe 1 f(x) = f(b) and = 1 g(x) = g(b). By the one-sided 


analogs of Theorem 3.2.10 (2) (4) and Exercise 3. 2, 1 6 deduce that a [f(x)g(x) — 
f(a)8(a)| = F(b)8(4) — fla)s(a). 

Let t € (a,b). Because f and g are differentiable on [a,b), then by Exer- 
cise 4.2.3 (5) f lla] and Silas] are differentiable. Because f’ and g’ are locally in- 
tegrable, then f’| fa] and g! lla] are integrable. We can now apply Integration by Parts 
for Definite Integrals (Theorem 5.7.6) to f|jq,| and g| ja), and we deduce that [f"g]| ja. 
and [fg"]|{a,] are integrable and 


[ food ear= [elle Flaa(a))— fF) ax 
Using the one-sided analog of Theorem 3.2.10 (2), and the fact that hin [f(t)g(t) — 
F(a)g(a)] exists and lim [f(¢)g(t) — f(a)g(a)] = f(b)g(b) — fla)a(a), we deduce 
that lim Ji f(x)g! (x) dx exists if and only if lim Ji f'(x)g(x) dx exists, and if these 


limits exist then 


tim [Foe @)drx=[fl0)e(6) Fast — tim [f(x 


tb— 


We deduce immediately that fg is improperly integrable if and only if fg’ is improp- 
erly integrable, and if they are improperly integrable then 


b b 
[ #038) dx=[fO)s()- F@s@)]- fF’ @)g(2) ax 
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Reflections 


It might appear as if this section is making a big deal out of very little. First, why 
must we define improper integrals by limits, rather than defining them directly? Let us 
consider the case of Type | improper integrals. The definition of the Riemann integral 
of a function on a closed bounded interval is in terms of Riemann sums, and the 
key observation is that the only type of sum that is guaranteed to exist is the sum of 
finitely many numbers, which is what a Riemann sum is. If one were to try to define 
a “Riemann sum” directly for a function on an unbounded interval, then either the 
unbounded interval would have to be subdivided into infinitely many subintervals, or 
at least one of the subintervals would itself have to be unbounded, and in neither case 
would it be certain that the sum could be evaluated. Hence, to be able to use Riemann 
sums, we must restrict our attention to functions on closed bounded intervals, and 
hence we define improper integrals as limits of regular integrals. 

Second, even if we accept as reasonable the use of limits in the definition of 
improper integrals, why must we go to all the effort of proving the various theorems 
in this section? The answer is seen by analogy with the concept of series, which the 
reader has likely seen informally in a calculus course (and which we will discuss 
series in this text in Chapter 9). The tricky part in dealing with series is not defining 
what it means for series to be convergent in principle, but rather evaluating whether 
any given series is in fact convergent or divergent. We should think about improper 
integrals similarly, in that a number of the theorems in the present section are aimed at 
showing that certain improper integrals are convergent. Clearly the Comparison Test 
for Type 2 Improper Integrals, which is an analog of the corresponding convergence 
test for series, is a result of this sort, but even the last two theorems in this section can 
be thought of as results that tell us when certain improper integrals are convergent 
(with the added benefit of nice formulas for the values of these integrals when they 
exist). 


Exercises 


Exercise 6.4.1. [Used in Example 9.3.7.] Let p € R. Prove that the improper integral 
i + dx is convergent if and only if p > 1. Make use of standard properties of the 
graph of x’; these properties will be proved rigorously in Exercise 7.2.16. 


Exercise 6.4.2. [Used in Section 6.4.] Let [a,0c) C R be a closed unbounded interval, 


let f,g: [a,-c) — R be functions and let k € R. Suppose that f and g are improperly 
integrable. 


(1) Prove that f + g is improperly integrable and "| f + g](x)dx = [7 f(x) dx+ 
Jo g(x) dx. 
(2) Prove that kf is improperly integrable and | [kf|(x)dx =k J f (x) dx. 
(3) Prove that if f(x) > g(x) for all x € [a,ce), then f° f(x) dx > [© g(x) dx. 
Exercise 6.4.3. [Used in Section 6.4.] Find an example of a function f: [1,0c) — R 


such that f is improperly integrable, but that f is not bounded on any interval of the 
form [a,°°), where a € [1, °°). [Use Exercise 5.3.3 (3).] 
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Exercise 6.4.4. Let [a,0c) C R be a closed unbounded interval, and let f: [a,oc) +R 
be a function. Suppose that f is improperly integrable. Prove that if lim f(x) exists, 
xX—00 


then lim f(x) = 0. (Exercise 6.4.3 shows that lim f(x) need not exist.) 
X—00 xX—00 


Exercise 6.4.5. [Used in Section 7.3 and Theorem 7.4.3.] As for all other exercises 
in this section, you may use standard rules for integration, even if we have not yet 
proved them, but for this exercise do not use trigonometric functions and inverse 
trigonometric functions, because we will use this exercise in the definition of the 
arcsine function in Section 7.3. 


(1) Prove that the improper integral ie aS dx is convergent. 


(2) Prove that the improper integral i : 5 dx is convergent. 
=x 


Vi 

Exercise 6.4.6. [Used in Theorem 6.4.11.] Let [a,b) C R be a non-degenerate half- 
open interval, let c € [a,b) and let f: [a,b) > R be a function. Suppose that f 
is locally integrable. Prove that f is improperly integrable if and only if f I[c,b) is 
improperly integrable. 


Exercise 6.4.7. [Used in Section 6.4 and Theorem 7.4.3.] Let [a,b) C R be a non- 
degenerate half-open interval, let f,g: [a,b) — R be functions and let k € R. Suppose 
that f and g are improperly integrable. 


(1) Prove that f + g is improperly integrable and ihe (f +.g](x) dx = c f(x)dx+ 
[Pelee 

(2) Prove that kf is improperly integrable and [ i [Kf] (x)dx =k ie f (x) dx. 

(3) Prove that if f(x) > g(x) for all x € [a,b), then [? f(x) dx > [? g(x) dx. 


Exercise 6.4.8. Find an example of functions f,g: (0, 1] — R such that f and g are 
improperly integrable, but that fg is not improperly integrable. 


Exercise 6.4.9. Let [a,b) C R be a non-degenerate half-open interval, and let 

f,g: [a,b) — R be functions. Suppose that f(x) > 0 and g(x) > 0 for all x € [a,b). 

Suppose that lim te exists and lim f “ > 0. Prove that ic f(x) dx is convergent if 
x—ob- &% xob~ 8 ‘ 

and only if hs g(x) dx is convergent. (For the reader who is familiar with convergence 

tests for series, observe that this exercise is the analog of the Limit Comparison Test 


for series, which we will see in Section 9.3.) 


Exercise 6.4.10. [Used in Section 6.4 and Exercise 6.4.13.] State and prove the analog 
of Theorem 6.4.11 (Comparison Test for Type 2 Improper Integrals) for Type | 
improper integrals. 


Exercise 6.4.11. [Used in Section 6.4.] State and prove the analog of Theorem 6.4.12 
(Integration by Substitution for Type 2 Improper Integrals) for Type 1 improper 
integrals. 


Exercise 6.4.12. [Used in Section 6.4 and Exercise 6.4.13.] State and prove the analog 
of Theorem 6.4.13 (Integration by Parts for Type 2 Improper Integrals) for Type | 
improper integrals. 
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Exercise 6.4.13. This exercise discusses the gamma function, which is a general- 
ization of the notion of factorial (discussed in Example 2.5.12) to the positive real 
numbers. (In fact, the gamma function is defined for most complex numbers, though 
we will not discuss that here.) See [Art64] for more about the gamma function. 

Let I": (0,0¢) — R be defined by 


r@)= e rl dt 
0 


for all x € (0,9). 
Let x € (0,c¢). The integral in the definition of I’(x) is improper. More precisely, 


d 


if we break up the integral as 


1 co 
i ar ar | er a, 
0 1 


then the first of these integrals is a Type 2 improper integral for each x € (0,1), and 
the second of these integrals is a Type | improper integral for each x € (0,°). 


(1) Prove that the integral i et! dt is convergent. 
(2) Prove that the integrals [7° e~‘t*! dt is convergent. 
(3) Prove that C(x+1) =xI"(x). 
(4) Letn EN. Prove that C(n) = (n—1)!. 
[Use Exercise 6.3.1 (1), Exercise 6.4.10 and Exercise 6.4.12.] 


6.5 Historical Remarks 


The historical remarks for this chapter are very brief. The current chapter discusses 
limits to infinity of functions, which are a slight variation of regular limits of functions, 
and hence much of the history of the material in this chapter is subsumed in the 
historical remarks in Section 3.6. Moreover, limits of sequences are also a type of 
limit to infinity, where x — co (with x being a real number) is replaced with n — oo 
(where n is a natural number), and hence the following historical remarks also have 
some overlap with the remarks in Section 8.5. 


Ancient World 


Parmenides of Elea (c.515—c.450 BCE) believed that motion is an illusion. His 
disciple Zeno of Elea (c. 490-c. 425 BCE) provided four famous arguments to show 
that there is no motion. For example, Zeno’s third argument, the Arrow, says that at 
every instant, an arrow is in exactly one place, so that it cannot really move. A modern 
approach to this problem is to think of the arrow as going through infinitely many 
instances, and we then have to consider the indeterminate form 0- °°. 


Medieval Period 


Bhaskara I (1114-1185) believed in the infinite, and in 1150 essentially said that 
5 = oo and co+a = oo, These formulas are reminiscent of our rules about limits to 


infinity. 
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Seventeenth Century 


Type | improper integrals make use of limits to infinity, and an early example of such 
an integral was computed by Evangelista Torricelli (1608-1647), who showed that 
rotating an infinite hyperbola yielded a finite volume. Torricelli claimed to be the first 
person to show that an infinitely large object can have finite content, though the idea 
may have occurred to Oresme much earlier, and perhaps to Fermat and Roberval as 
well. 

The symbol “co” that we now use for infinity is due to John Wallis (1616-1703) 
in 1659. He used this symbol with two different meanings, first as the number of 
lines into which a region is divided, and second as the thing to which n goes as we 
subdivide a region into more and more pieces. This multiplicity of meanings of the 
symbol ©», and in general of meanings of the word “infinity,” persists to this day, and 
it can only be clarified by rigorous definitions for each context where this symbol and 
word are used. 

In 1696 Guillaume de |’ Hépital (1661-1704) published the first printed textbook 
on differential calculus, Analyse des infiniment petits pour l’intelligence des lignes 
courbes. He never published a textbook on integral calculus. Many of the ideas in 
l’H6pital’s book were due to Johann Bernoulli (1667—1748), who was paid by the 
nobleman |’H6pital. This textbook includes what we now call l’H6pital’s Rule in the 
8 case, though it should presumably be called Bernoulli’s Rule; the proof given for 
this result would not be considered rigorous today. 


Eighteenth Century 


Leonhard Euler (1707-1783) freely used infinitely large and infinitely small numbers 
(the latter being infinitesimals). He let w denote an infinitely small number, and he 
let ] = =, where x is a positive real number, so that J is infinitely large. He then 
wrote equations such as J]— | = / and rt = 1, which we would write today as 
im (n— 1) = and im wt = |, respectively. Euler’s intuition about infinity was 


later justified by the rigorous treatment of infinitesimals by Robinson, as mentioned 
at the end of Section 3.6. 


Nineteenth Century 


Carl Friedrich Gauss (1777-1855) did not accept infinitely large quantities, and 
instead used an inequality technique to prove that some limits exist, though such 
proofs were not rigorous by subsequent standards. He implicitly used some results that 
we now prove, such as the Monotone Convergence Theorem, and did not explicitly 
state all of the definitions he used, such as limits to infinity. 

Augustin Louis Cauchy (1789-1857), who gave the first rigorous definition of 
definite integrals for continuous functions on closed bounded integrals, also defined 
Type | improper integrals for continuous functions on unbounded intervals, and 
Type 2 improper integrals for functions on closed bounded intervals that have isolated 
discontinuities. 


7 


Transcendental Functions 


7.1 Introduction 


In previous chapters of this book we used various standard functions, called elementary 
functions, to provide examples of the concepts under discussion. These functions are 
familiar to the reader from precalculus and calculus courses, and are found in many 
applications of mathematics. We are now in a position to give a rigorous treatment of 
the elementary functions we have been using. 

The most widely used elementary functions are the linear, polynomial, rational, 
exponential, logarithmic and trigonometric functions. The first three of these are 
called algebraic functions, and the second three are called transcendental functions. 
The algebraic functions are simple to define; polynomial functions (including linear 
functions) were defined in Definition 2.5.10, and rational functions are just quotients 
of polynomial functions. The transcendental functions, by contrast, are much harder 
to define than the algebraic ones, and the definitions given in precalculus and calculus 
courses are often rather informal. For rigorous definitions of these functions, and 
proofs that they behave as expected, real analysis is needed. In fact, real analysis is 
even needed for a rigorous definition of x” when r is irrational. 

Contrary to the approach in precalculus courses, where power functions are 
defined first, and then exponential functions are defined in terms of power functions, 
and then logarithms are defined in terms of exponentials, here we take the standard 
rigorous approach, which reverses the process by starting with logarithms (defined in 
terms of integration), and then defining exponentials in terms of logarithms, and then 
defining power functions in terms of exponentials. We then define the sine and cosine 
functions, which are trickier to define than logarithms and exponentials. We do not 
discuss the other four standard trigonometric functions, because they can be expressed 
in terms of sine and cosine. We conclude the chapter with further discussion of the 
number 7, which is first defined as part of our discussion of sine and cosine. Although 
our treatment of sine and cosine will be unrelated to logarithms and exponentials, 
except somewhat by analogy, in fact these trigonometric functions are related to 
exponentials (and hence to logarithms) via complex numbers, though it is beyond the 
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scope of this book to discuss such matters; see any introductory complex analysis 
text, for example [BC09, Chapter 3], for details. 

The problem with the approach to defining exponentials and logarithms in calculus 
courses is not the definition of exponential functions in terms of power functions, nor 
the definition of logarithms in terms of exponentials—both those steps are fine—the 
problem is in the informal definition used for power functions. Let x € (0,0). It 
is clear intuitively what x” means for all n € N; see Definition 2.5.6 for a formal 
definition that captures this intuitive idea. It is also simple to extend this definition 
to all n € Z, as we saw in Definition 2.5.8. It is also not hard, intuitively, to define 
x? for all g € Q. If a,b EN, then we could define xb = YX; to be truly rigorous 
here one needs to provide a rigorous definition of ¥/x for all x > 0 and n € N, which 
we did in Exercise 3.5.6, though which is not usually done in a calculus class. The 
real problem occurs when we want to define x” for an irrational number r. At best 
in a calculus class, it is stated informally that such a definition can be made using 
limits of sequences, where one first finds a sequence {c,};"_, of rational numbers that 
converges to r, and one then defines x” as the limit of the values of x" as n goes to 
infinity, though of course the details of such a construction are omitted (the details 
would include a proof that the desired limit always exists, and that it is independent 
of the choice of the sequence that converges to r); at worst, it is not even stated that 
some sort of technically complicated definition of x” is needed. In principle, rather 
than using sequences, it would be possible to formulate the definition of x” using least 
upper bounds, as follows. If x > 1, we could define x" to be lub{x? | ¢ € Q and q <r}; 
if 0 <x < 1, we could define x” to be glb{x? | g € Q and q < r}. Of course, one would 
first have to use the Least Upper Bound Property and the Greatest Lower Bound 
Property to verify that this least upper bound and this greatest lower bound exist, and 
once that is established, it would take a bit of effort to prove that this definition of x” 
behaves as one would want (for example that the function x’ is differentiable, and has 
the expected derivative). Such an approach can indeed be made to work, as seen in 
[Olm62, Sections 1102-1105]. Fortunately there is a nicer way to proceed that avoids 
this direct use of least upper bounds and greatest lower bounds, as we will soon see. 

In our discussion of logarithms and exponentials, we will focus on the natural log- 
arithm and e’, and pay little attention to other logarithmic and exponential functions, 
because the natural logarithm and e* are the best behaved among their peers, and there 
is very little reason to use the other logarithmic or exponential functions. Historically, 
base 10 logarithms were used for computational purposes prior to the widespread use 
of computers and calculators, though given the existence of such technology, there is 
little need for base 10 logarithms any more. 


7.2 Logarithmic and Exponential Functions 


We start with a discussion of logarithms. Although logarithms were invented as a tool 
for doing numerical calculations in the pre-computer era, a use for which logarithms 
are no longer needed, we now view logarithms as functions, which turn out to be 
very useful in a variety of applications. We will focus our attention on the “natural 
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logarithm,” which informally is the logarithm function with base e. However, because 
we have not yet defined the number e (we will do so later in this section), we will now 
give a definition of the natural logarithm function that makes no reference to bases in 
general or the number e in particular. Logarithms with other bases will be defined at 
the end of this section, just to show that it can be done. Because the natural logarithm 
function is the only logarithm function we will need, we will often drop the word 
“natural” and refer to it simply as the “logarithm function.” 

The idea behind the definition of the logarithm function is that whereas the 
function itself is somewhat tricky, its derivative is much simpler, and is a function 
that we can deal with using what we have previously seen. Of course, we cannot take 
the derivative of a function that we have not yet defined, but intuitively we know 
what we want the derivative of the natural logarithm function to be, namely, the 
function f: (0,0c) > R defined by f(x) = + for all x € (0,00). Indeed, we will simply 
define the natural logarithm to be a function whose derivative is the function f. The 
key ingredients in our construction of the natural logarithm will be the Fundamental 
Theorem of Calculus Version I (Theorem 5.6.2), and the fact that the function f is 
continuous, using Example 3.3.3 (2), and hence locally integrable, as remarked after 
Definition 6.4.1. 


Definition 7.2.1. The natural logarithm function is the function In: (0,0) + R 


defined by 
sal | 
inx= | —dt 
1 ¢t 


for all x € (0,0). A 


Observe that no analogous definition for the exponential function can be given, 
because the derivative of the exponential function is not anything simpler than itself. 
It is for that reason that we will define the exponential function in terms of the natural 
logarithm, and not vice versa. 

We now see some of the familiar properties of the natural logarithm function. 


Theorem 7.2.2. 


1. The function In is differentiable, and \n! x = 1 for all x € (0,00). 
2. The function In is strictly increasing. 


Proof. 


(1) This fact follows from the Fundamental Theorem of Calculus Version I (Theo- 
rem 5.6.2). 


(2) By Part (1) of this theorem, and making use of the fact that 1 > 0 for all 
x € (0,0¢), we deduce that In’ x > 0 for all x € (0,0). It then follows from Theo- 
rem 4.5.2 (2) that In is strictly increasing. 


Because we know the derivative of In, it is possible to sketch the graph of the 
function, as seen in Figure 7.2.1. 

The next set of properties of the natural logarithm function makes use of Defini- 
tion 2.5.6 and Definition 2.5.8. 
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Fig. 7.2.1. 


Theorem 7.2.3. Let x,y € (0,00), and let n € Z. 


1. Int=0. 

2. In(xy) =Inx+Iny. 
3: In(*) = Inx—Iny. 
4. In(x") =nInx. 


Proof. 
(1) This fact follows directly from the definition of In. 
(2) Let h: (0,00) — R be defined by A(t) = In(at) for all t € (0,00). Observe that 


h is well-defined, because x,t € (0,c0) implies that xt € (0,°). We know that In is 
differentiable by Theorem 7.2.2 (1), and it then follows from Example 4.2.3 (1) and 


the Chain Rule (Theorem 4.3.3) that h is differentiable and 


/ 1 1 / 
Ka psa ait 
for all t € (0,00). By Lemma 4.4.7 (2) there is some C € R such that A(t) = Int+C 
for all t € (0,c¢). Substituting t = 1 into this last equation yields In(1) +C = h(1) = 
In(x- 1) = Inx, and then using Part (1) of this theorem we deduce that C = Inx. It 
follows that h(t) = Int+Inx for all t € (0,0). If we substitute ¢ = y in this last 
equation, we deduce that In(xy) = Inx+ Iny. 


(3) As a preliminary step, substituting x = i into Part (2) of this theorem yields 
In(1) = In(+) +Iny. Making use of Part (1) of this theorem, it follows that In(+) = 
—Iny. Hence In(=) = In(x- ‘) = Inx +In(t) = Inx—Iny. 


(4) First, we prove that In(x") =nInx for all € N by induction on n. If n = 1 then 
the result is trivial. Now let n € N. Suppose that the result is true for n. By Part (2) 
of this theorem together with Definition 2.5.6 we see that In(x’*!) = In(x- x”) = 
Inx + In(x") = Inx + nInx = (n+ 1)Inx. Hence the result is true for n +1, and we 
deduce that the result holds for all n € N. 
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Next, we note that x° = 1 by Definition 2.5.8, and Part (1) of this theorem then 
implies that In(x°) = In 1 =0 =0-Inx. Therefore the result holds for n = 0. 

Finally, let m € N. Then x~” = (x"")~! = +; by Definition 2.5.8. Hence by Parts (1) 
and (3) of this theorem, together with what we have just proved, we see that In(x~””) = 
In(ar) = In1 —In(v") = 0—mIlnx = (—m)Inx. Therefore the result holds for all 
ne —N. 


We will see in Theorem 7.2.14 (1) that the analog of Theorem 7.2.3 (4) holds for 
all € R, not just for all n € Z, but to do so we will have to wait until we have defined 
x" foralln ER. 

We want to define the exponential function as the inverse function of In, but in 
order to be sure that In has an inverse function, we need the following lemma. Whereas 
it is easy to see that In is injective, because it is strictly increasing, it is a bit trickier 
to show that In is surjective, and we will need to make use of the Intermediate Value 
Theorem (Theorem 3.5.2). 


Lemma 7.2.4. The function In is bijective. 


Proof. We know from Theorem 7.2.2 (2) that In is strictly increasing, and it follows 
immediately that In is injective. 

Let y € R. We need to show that there is some w € (0,°°) such that y = Inw. By 
Exercise 7.2.1 we know that In2 > 0, and we can therefore apply Exercise 2.6.12 to 
deduce that there is some m € N such that y € [—m1In2,mIn 2]. By Theorem 7.2.3 (4) 
it follows that y € [In(2~”),In(2”)]. If y = In(2~”), then we could take w = 27”; 
observe that 2-” = (+)” > 0, so w would be in (0,e). If y = In(2”), we could 
take w = 2”. Now suppose that y € (In(2~”),In(2”)). By Theorem 7.2.2 (1) In is 
differentiable, and therefore In is continuous by Theorem 4.2.4. We can now apply 
the Intermediate Value Theorem (Theorem 3.5.2) to In |[2~””,2”] to deduce that there 
is some w € (2~",2’") such that y = Inw. 


We now turn to exponential functions, which are extremely widespread in mathe- 
matics and its applications. In precalculus and calculus courses, such functions are 
usually defined by saying f(x) = a“ for all x € R, where a is some positive real 
number. The most useful exponential function, and the one which we will refer to 
as “the exponential function,” is the one usually defined informally as e*. As much 
as this formulation of exponential functions seems reasonable intuitively, there are 
two problems with this approach, which are that we have not yet defined how to raise 
a real number to an arbitrary real number power (we have only dealt with integer 
powers so far), and that we have not yet defined the number e. Instead, we will define 
the exponential function in terms of the natural logarithm, with no mention made of 
the number e yet. 

The crucial tool in the definition of the exponential function is the notion of 
an inverse function; the reader should review the discussion of inverse functions in 
Section 4.6. 

The following definition makes sense by Lemma 7.2.4. 


Definition 7.2.5. The exponential function is the function exp: R — (0,°¢) defined 
by exp =In7!, A 
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Observe that the codomain of exp is (0,°¢) rather than R, which makes no differ- 
ence in practice, but was needed for the sake of exp being the properly defined inverse 
function of the natural logarithm. 

The fact that exp is the inverse function of In can be expressed in two equivalent 
ways, which are stated in the two parts of the following lemma. The second part is 
more commonly taught in courses such as precalculus, and is included for the sake of 
familiarity; the first part is more useful in proofs. 


Lemma 7.2.6. Let x € (0,00), and let y € R. 
1. exp(Inx) = x and In(expy) = y. 
2. Inx = y ifand only if expy = x. 
Proof. 


(1) This part of the lemma follows immediately from the fact that exp = In7!, 
together with the definition of inverse functions; see [Blol0, Section 4.3] for a 
discussion of inverse functions. 


(2) Suppose that Inx = y. Then exp(Inx) = exp y, and hence Part (1) of this lemma 
implies that x = exp y. The proof of the other implication is similar, and we omit the 
details. 


As mentioned in Section 4.6, the graph of an inverse function is the reflection 
of the graph of the original function in the line y = x. We saw a sketch of the graph 
of In in Figure 7.2.1, and so we immediately obtain the graph of exp, as seen in 
Figure 7.2.2. 


Fig. 7.2.2. 


We now see some familiar properties of the exponential function. 
Theorem 7.2.7. 


1. The function exp is bijective. 
2. The function exp is differentiable, and exp! x = expx for allx ER. 
3. The function exp is strictly increasing. 
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Proof. 


(1) We know that In is bijective by Lemma 7.2.4. Given that exp = In7!, it follows 
from a standard fact about functions that exp is bijective; see [Blo10, Exercise 4.4.13] 
for details. 


(2) By Theorem 7.2.2 (1) we know that In is differentiable, and as noted in the 
proof of Part (2) of that theorem we know that In’ x > 0 for all x € (0,00). Therefore 
Theorem 4.6.4 (3) (4) applied to In imply that exp = In~! is differentiable, and that 


1 1 1 
exp’ x = [In7!]'(x) = = = = expx 
. ee In’(In~!(x)) — In’(expx) ! P 


expx 


forallx ER. 


(3) Using Part (2) of this theorem, and making use of the fact that expx > 0 for 
all x € R, we see that exp’ x > 0 for all x € R. It then follows from Theorem 4.5.2 (2) 
that exp is strictly increasing. 


Theorem 7.2.8. Let x,y € R, and letn € Z. 


1. expO= 1. 
2. exp(x+y) = expx-expy. 
expx 


3. exp(x—y) = =r 
4. exp(nx) = [expx]”. 


Proof. We will prove Parts (1) and (2), leaving the rest to the reader in Exercise 7.2.6. 
In this proof we will make repeated use of Lemma 7.2.6 (1). 


(1) We know by Theorem 7.2.3 (1) that In 1 = 0. Hence exp0 = exp(In1) = 1. 


(2) Because In is bijective, by Lemma 7.2.4, there are w,z € (0,0) such that 
Inw = x and Inz = y. It follows from Theorem 7.2.3 (2) that In(wz) = Inw + Inz. 
Hence exp(x+y) = exp(Inw-+ Inz) = exp(In(wz)) = wz = expx-expy. 


We will see in Theorem 7.2.14 (2) that the analog of Theorem 7.2.8 (4) holds for 
all n € R, not just for all n € Z. 

We are now ready to give the definition of the number e; this definition makes 
sense because In: (0,cc) — R is bijective. 


Definition 7.2.9. The number e is the unique number in (0,c°) such thatIne=1. A 


1 n 
e= lim (1+.) : 
n—-eoo n 


To make such a definition rigorous, it would be necessary first to define what is meant 
by limits of sequences, and second to prove that this particular limit exists, neither 
of which we have learned about yet. We will do these things later in the text, and we 
will indeed see that the definition of e given in Definition 7.2.9 is equivalent to the limit 


Some texts define e as 
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definition of e. More strongly, we will see in Example 8.4.3 that 


r\n 
expr = lim (1 + “) 
neo n 
for all r € R. Both our definition of e and this other definition in terms of sequences 
are useful in certain circumstances. Of course, it is not legitimate to define the same 
thing in two different ways. If one wants to use both approaches to e, one has to choose 
one of these definitions of e, and then prove that the other definition is equivalent 
to the chosen one. We choose the definition of e in terms of the natural logarithm, 
because it requires fewer technicalities. 

The following lemma shows that with respect to the integers, the exponential 
function is precisely what we want it to be. 


Lemma 7.2.10. Let n € Z. Then expn = e". In particular exp1 = e. 


Proof. First, we note that because Ine = 1, then by Lemma 7.2.6 (1) we see that 
exp | = exp(Ine) = e. We then use Theorem 7.2.8 (4) to see that expn = exp(1-n) = 
[exp 1]” =e". 


We will see in Theorem 7.2.14 (3) that the analog of Lemma 7.2.10 holds for all 
n€R, not just for all n € Z. 

We now turn to power functions, which have the form f(x) =." for all x € (0,0), 
where r is areal number. Power functions are among the most widely used functions, 
and no discussion of functions would be complete without them. Even though power 
functions might appear to be a more elementary type of function than logarithms and 
exponentials, we have waited until now to define power functions because the intuitive 
way of defining power functions is difficult to make rigorous, as was mentioned in 
Section 7.1, but there is an alternative, and easier, way to proceed that makes use of 
the natural logarithm and the exponential function. 

In our discussion of x”, we restrict attention to x > 0 rather than all x € R because 
we want x? to equal \/x, and we cannot take square roots of negative numbers. We 
ignore x = 0 to avoid special cases, though we can simply define 0” = 0 for all 
r € R— {0}. We do not attempt to define “0°,” because that is an indeterminate form, 
similarly to 0. ce and co — co; see Exercise 6.3.8 for evidence that 0° is an indeterminate 
form. 


Definition 7.2.11. Let r ¢ R. Let p;: (0,00) — (0,00) be defined by p,(x) = 
exp(rlnx) for all x € (0,°¢). A 


We will switch to more standard notation, and write x” instead of p,(x). This more 
standard notation is not entirely proper, however, because the name of the function 
(in this case p,) should not include any specific elements of the domain (in this case 
x). When we write p, we write the name of the function, and when we write p,;(x) we 
mean an element of the codomain. With the expression “x’” there is no name of the 
function to write that does not include x, so it is not a proper name for a function, but 
we will write x” nonetheless, because it is completely standard. 

We now see some standard properties of x’. 
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Theorem 7.2.12. Let x,y € (0,), and let r,s ER. 


1. 1° =1, and x® = 1, andx! =x. 
2. xP tS = x"xS 
3, SH 
a 
4. (x")S =x", 
5. (xy) =a'y", 
7xtat 


Proof. We will prove Part (2), leaving the rest to the reader in Exercise 7.2.9. 


(2) By Theorem 7.2.8 (2) we see that x"x* = exp(rInx) -exp(sInx) = exp(rInx+ 
sInx) = exp((r+s)Inx) =x"*°, 


In Section 7.1 an informal, albeit technically complicated, definition of xb was 
suggested, where a,b € N. The reader is asked in Exercise 7.2.11 (2) to show that this 
suggested definition is in fact a consequence of Definition 7.2.11 and Theorem 7.2.12. 


Theorem 7.2.13. Letr€ R. 


1. The function x" is differentiable, and {x" = rx"~! for all x € (0,~). 

2. If r > 0, the function x" is strictly increasing; if r = 0, the function x" is 
constant; and if r <0, the function x" is strictly decreasing. 

3. Ifr £0, the function x" is bijective. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 7.2.10. 


(1) Combining Theorem 4.3.1 (3), the Chain Rule (Theorem 4.3.3), Theo- 
rem 7.2.2 (1) and Theorem 7.2.7 (2), we deduce that the function x” = exp(rInx) is 
differentiable, and [x"]’ = [exp(rInx)]' = exp(rInx)-r- 4 =x"-+r-4 = rx"! for all 
x € (0,0). 


Before proceeding any further in our discussion of x”, we need to clarify one 
important matter. For each x € (0,00) andn € Z, we have actually defined the value 
of x” in two different ways, once in the combination of Definition 2.5.6 and Defini- 
tion 2.5.8, and once in Definition 7.2.11; the latter definition is entirely independent 
of the former. If our definitions are to make sense, we need to verify that they 
both yield the same result. Let x € (0,°°). First, by Definition 7.2.11 together with 
Lemma 7.2.6 (1) and Theorem 7.2.12 (2) we see that if we use our new definition of x’, 
then x! = exp(1-Inx) =exp(Inx) =x, and ifn € Nthen x?t! =x!" = 4! . yx" = x.y", 
Definition 2.5.6 was based upon Definition by Recursion (Theorem 2.5.5), and the 
uniqueness in that theorem implies that the definition of x” for all n € N in both 
Definition 2.5.6 and Definition 7.2.11 must agree. Again using Definition 7.2.11, this 
time with Theorem 7.2.8 (1) and Theorem 7.2.12 (7), we see that x° = exp(0-Inx) = 
exp0 = 1, and that ifn € N then x" = J, = (x")~!. Hence we see that Definition 2.5.8 
agrees with Definition 7.2.11. 

We are now in a position to prove some promised properties of logarithms and 
exponentials that involve power functions. 
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Theorem 7.2.14. Let x € (0,°), and let y,r € R. 


1. In(x") =rlnx. 


2. exp(ry) = [exp(y)|’. 
3. expr=e’. 
4. e™* =x and In(e’) =y. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 7.2.12. 


(1) Using the definition of x” and Lemma 7.2.6 (1), we see that In(x”) = 
In(exp(rInx)) = rInx. 


Because of Theorem 7.2.14 (3) we can now switch to the more standard “e*” 
notation rather than “exp.x.” As with the notation x’, the notation e* is not a proper 
name for a function, but we will use it nonetheless, because it is standard. 

We conclude this section with a look at exponential functions and logarithm 
functions with bases other than e, this time starting with exponentials. 


Definition 7.2.15. Let a € (0,cc). The exponential function with base a is the 
function exp,: R — (0,0) defined by exp,.x = a* = exp(xIna) for allx ER. A 


By Theorem 7.2.14 (3) we note that exp, = exp. As with the notation x” and e*, 
we will use the improper but more standard notation a* rather than exp,, x. 

For any a € (0, °°), the function a’ satisfies all the analogous properties of e* stated 
in Theorem 7.2.8; some of these properties are proved in Exercise 7.2.13. The one 
place where there is a difference between e* and a* when a ¥ e is in the formula for 
the derivative. The following theorem is derived immediately from Theorem 7.2.7 (2) 
and the Chain Rule (Theorem 4.3.3), and we omit the details. 


Theorem 7.2.16. Let a € (0,00). The function a* is differentiable, and |a*|' = a*Ina 
forallx ER. 


The fact that e* has a simpler derivative than a* when a # e is the reason why e* 
is preferred over exponential functions with other bases. 

The following lemma is needed to allow us to define logarithms with arbitrary 
bases. 


Lemma 7.2.17. Let a € (0,°). 


1. Ifa > 1, the function a* is strictly increasing; if a = 1, the function a* is 
constant; and if 0 < a <1, the function exp, is strictly decreasing. 
2. Ifa # |, then the function exp, is bijective. 


Proof. 


(1) By Theorem 7.2.16 we know that [a*]' = a* Ina for all x € R. By the definition 
of In, it is seen that Ina is positive, zero or negative ifa > 1, ora=1, or0<a< 1, 
respectively. The desired result now follows from Theorem 4.5.2. 


(2) Suppose that a 4 1. Then by Part (1) of this lemma we know that exp, is 
either strictly increasing or strictly decreasing. It follows that exp, is injective. 
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Let z € (0,0¢). By Theorem 7.2.7 (1) we know that exp is surjective. Hence there 
is some w € R such that expw = z. We know by Theorem 7.2.3 (1) that In 1 = 0, and 
by Lemma 7.2.4 that In is injective, and a £ | implies that Ina ¥ 0. It follows that 


WwW 


exp, (4) = exp(;4; : Ina) = expw =z. Therefore exp, is surjective. 


Lemma 7.2.17 (2) allows us to make the following definition. 


Definition 7.2.18. Let a € (0,0¢). Suppose that a 4 1. The logarithm function with 
base a is the function log, : (0,0c) — R defined by log, = (exp,)~!. A 


Observe that log, = In. The relationship of log, to exp, for any a is the same as 
the relationship between In and exp, as stated in the following lemma; the proof of 
this lemma is just like the proof of Lemma 7.2.6, and we omit the details. 


Lemma 7.2.19. Let a,x € (0,°¢), and let y € R. Suppose that a # 1. 


1. q'8a* = x and log, (a) =y. 
2. log, x = y ifand only if a’ = x. 


For any a € (0,°°) such that a 4 1, the function log, satisfies all the analogous 
properties of In stated in Theorem 7.2.3; some of these properties are proved in 
Exercise 7.2.14 (1) (2). As seen in Part (3) of that exercise, one place where there 
is a difference between log, and In when a  e is in the formula for the derivative. 
As seen in the following lemma, a logarithm function with one base is just as good 
as a logarithm function with any other base, and so bases other than e are not really 
needed. We defined logarithm functions with bases other than e simply because the 
reader has most likely encountered such functions previously, and we wanted to show 
that they can be defined rigorously. 


Lemma 7.2.20. Let a,b,x € (0,0). Suppose thata # 1 and b #1. Then 


_ log, x 


log, x = : 
log, a 


Proof. Left to the reader in Exercise 7.2.15. 


Reflections 


Students in precalculus and introductory calculus courses sometimes find the idea 
of logarithms confusing, though from a mathematician’s perspective it is not entirely 
clear why that is so. Students do not find the exponential function nearly as troubling, 
and yet the exponential function and the natural logarithm function are not inherently 
very different as functions; they are inverse functions of each other, and they have 
rather analogous properties. From the author’s experience teaching precalculus and 
calculus, he suspects that the difficulty students find with logarithms is because they 
are typically defined in such courses as the inverse functions of exponential functions, 
and the inverse function relationship, especially as usually stated in these courses, 
seems to be confusing—students often find inverse trigonometric functions similarly 
troubling. 
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Specifically, in elementary courses the exponential function is defined intuitively 
in terms of the number e raised to a power (which is usually explained carefully for 
rational powers, and usually glossed over for irrational powers). The natural logarithm 
function is then defined to be the function Inx that satisfies the property that Inx = y 
if and only if e” = x, and it is this if and only if statement used as a definition that 
is, perhaps, the source of the confusion. This definition of Inx is somewhat abstract, 
and does not resemble the way most familiar functions are defined (which is in terms 
of explicit formulas). In the rigorous treatment of logarithms, such as found in this 
text, we define logarithm directly rather than as an inverse function. Of course, we 
then need to define the exponential function as the inverse function of logarithm, and 
so inverse functions cannot be avoided; it is simply that logarithm gets the initial 
treatment this time. Moreover, rather than phrasing the inverse function relation 
between logarithm and exponential as is done in elementary courses, we emphasize 
the equivalent conditions In(e*) =x and e!"* = x, which is much more natural when 
thinking of exponentials and logarithms as functions. 

In addition to the fact that logarithms are discussed before exponentials, the other 
major difference between the treatment of these functions in a real analysis course 
and in elementary courses is the fact that whereas in elementary courses one thinks 
of the natural logarithm function as the logarithm with base e, in the real analysis 
approach we define Inx and e* without any reference to a “base” (and indeed without 
any reference to the number e). Moreover, we use the exponential function e* and 
the logarithm function Inx virtually exclusively, ignoring those with other bases; 
logarithms and exponentials with other bases were important historically, but are not 
needed in real analysis. 

The reader might wonder why we did not try to define the exponential function 
e* by an integral, and then define the natural logarithm as the inverse function of e*. 
The reason is that we want the derivative of e* to be itself, and therefore we cannot 
define it in terms of the integral of itself. By contrast, we want the derivative of Inx to 
be 1, and this latter function is continuous, and hence integrable on closed bounded 
integrals, which is what makes the logarithm function easy to define as an integral. 


Exercises 


Exercise 7.2.1. [Used in Lemma 7.2.4.] Prove that In2 > 0. 


Exercise 7.2.2. [Used in Example 10.4.11.] Prove that Inx is infinitely differentiable, 
n—-1 

and that In) (x) = iia for all x € (0,c¢) and alln EN. 

Exercise 7.2.3. [Used in Exercise 8.4.13.] Let a,b € (0,00). Suppose that a < b. Prove 

that bea <i In(2) < pa [Use Exercise 5.5.3.] 


Exercise 7.2.4. Prove that In(1 +.) < x for all x € (0,09). 


Exercise 7.2.5. [Used in Example 6.3.8.] Prove that lim Inx = —co, 


x—0t 
Exercise 7.2.6. [Used in Theorem 7.2.8.] Prove Theorem 7.2.8 (3) (4). 
Exercise 7.2.7. Prove that expx > 1+-x for all x € (0,0). 
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Exercise 7.2.8. [Used in Example 6.4.6.] 


(1) Using only the properties of e* stated in Section 7.2, prove that lim e~* = 0. 
x—0oo 
(2) Prove that lime“ =Oand lim e~ =0. 


Exercise 7.2.9. [Used in Theorem 7.2.12.] Prove Theorem 7.2.12 (1) (3) (4) (5) (6) (7). 
Exercise 7.2.10. [Used in Theorem 7.2.13.] Prove Theorem 7.2.13 (2) (3). 


Exercise 7.2.11. [Used in Example 6.4.8, Section 7.2 and Example 10.2.8.] Let x € 
(0,°°), and let n € N. In Exercise 3.5.6 we defined ¥/x. The purpose of this exercise is 
to show that that definition is compatible with the definition of x” given in the present 
section. 


(1) Prove that %/x = xn, 
(2) Let a,b € N. Prove that x5 = V/x4 = (4/x)*. 
(3) Prove that lim Wx =0. 


x 


(4) The method of Exercise 3.5.6 might appear to be conceptually very different 
from the approach to roots using fractional powers, because the former uses 
the Intermediate Value Theorem and the latter uses power functions, but in 
fact power functions ultimately rely upon the Intermediate Value Theorem as 
well; show where. 


Exercise 7.2.12. [Used in Theorem 7.2.14.] Prove Theorem 7.2.14 (2) (3) (4). 
Exercise 7.2.13. [Used in Section 7.2.] Let a € (0,00), and let x,y € R. 

(1) Prove that a**” = a‘a’. 

(2) Prove that a” = [a’}*. 
Exercise 7.2.14. [Used in Section 7.2 and Exercise 7.2.15.] Let a © (0,00), let x,y € 
(0,°¢) and let r € R. Suppose that a $ 1. 


(1) Prove that log, (xy) = log,x +log,y. 
(2) Prove that log,(x”) = rlog, x. 
(3) Prove that the function log, is differentiable, and log’, x = oh i. 


Exercise 7.2.15. [Used in Lemma 7.2.20.] Prove Lemma 7.2.20. 
(Use Exercise 7.2.14 (2).] 


Exercise 7.2.16. [Used in Exercise 6.4.1 and Example 9.3.7.] This exercise makes use 
of Exercise 6.2.15. Let r € R. Prove that 


oo, ifr>0 
limx’=<1, ifr=0 
0, ifr<0. 


7.3 Trigonometric Functions 


In addition to the exponential and logarithmic functions, the other commonly used 
transcendental functions are the trigonometric functions. Although the origin of these 


370 7 Transcendental Functions 


functions is in the study of triangles, they have many other important applications 
both inside and outside of mathematics, for example in the study of oscillatory motion 
(springs, pendulum, waves and more). Our concern in this text is not with triangles at 
all, but rather with giving the trigonometric functions a rigorous definition, and then 
proving their basic properties. 

There are six basic trigonometric functions, which are sine, cosine, tangent, secant, 
cosecant and cotangent. However, the latter four can be defined in terms of sine and 
cosine, and so we will restrict our attention to these two functions. 

In precalculus and calculus courses the trigonometric functions are usually defined 
in terms of the unit circle. Conceptually this approach is the nicest way of defining the 
trigonometric functions, but, as usually presented, it is not completely rigorous, unless 
it is preceded by a rigorous discussion of arc length, which is virtually never done in 
that context due to the technical difficulties it would entail (as seen in Section 5.9). 
Instead of using the unit circle, we will give a definition of sine and cosine that is 
somewhat analogous to our treatment of logarithmic and exponential functions in 
Section 7.2, though defining the trigonometric functions is a bit more complicated. 

When we discussed logarithms and exponentials, we started with the natural 
logarithm first because the derivative of the natural logarithm is a very simple function, 
namely, the function f: (0,0) + R defined by f(x) = + for all x € (0,2), which we 
already knew was continuous and hence integrable. By contrast, the derivative of the 
exponential function is not anything simpler than itself, and that is why we defined 
the exponential function as the inverse of the natural logarithm, and not vice versa. 

A similar situation occurs with the sine function. It’s derivative is not anything 
simpler than itself, and so we cannot define the sine function directly as an integral. 
However, the derivative of the arcsine function is a much simpler function than arcsine, 
and it does not involve trigonometric functions at all, and it is a function that we 
know is continuous, and hence is integrable. We will therefore start our treatment of 
the trigonometric functions with a definition of the arcsine function, and then define 
sine in terms of arcsine, and cosine in terms of sine. It is assumed that the reader is 
informally familiar with the arcsine function as the inverse of sine restricted to the 
domain [- as t| ; here we will give a definition of the arcsine function in terms of an 
integral. 

Recall the definition of the natural logarithm in Definition 7.2.1. To define the 
arcsine function analogously we will use the integral 


xX, | 
—— dt. 
[ 1-72 


It might not be apparent to the reader at this point what the relation is between 
integrating the function f: [—1,1] — R defined by f(x) = res for all x € (—1,1) 


1—x2 
and the trigonometric functions, but, as the reader will see in Section 7.4, this integral 
computes the arc length of part of the unit circle, which in turn is used in the informal 
definition of sine and cosine. We note that the above integral is a regular integral for 


x € (—1,1), but it is an improper integral for x = 1 and x = —1, because Fae is not 


defined for t = | or t = —1. Hence we will need to make use of improper integrals, 
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discussed in Section 6.4, in our discussion of sine and cosine; we will also make 
use of the related topic of limits to infinity, see in Section 6.2. The reader who has 
not read these two sections from Chapter 6, but nonetheless wants to learn about the 
trigonometric functions, can still read the present section, though a few things will 
have to be accepted without proof. 

Not only does the definition of sine and cosine have the complication of needing 
improper integrals, which did not arise in our discussion of logarithmic and expo- 
nential functions, but there is another difficulty in the definition of sine and cosine 
that was not encountered in the definition of logarithms and exponentials, which is 
that using the arcsine to define sine and cosine yields the definition of these functions 
only on [-3, 32] . To extend the definition to all IR, we need to have sine and cosine 
“repeat themselves” every 27, and as our first step toward the definition of the two 
trigonometric functions we now consider the general idea of functions that repeat 
themselves. This concept will also be used in our discussion of a continuous but 


nowhere differentiable function in Section 10.5. 


Definition 7.3.1. Let f: R — R be a function. The function f is periodic if there is 
some P € (0,°°) such that f(x+ P) = f(x) for all x € R. The number P is called the 
period of f. A 


Observe that if a function f has period P, then it has period nP for any n € N. 

The most familiar examples of periodic functions are sine and cosine, though 
many other periodic functions exist. Periodic functions play an important role in some 
parts of mathematics and in applications of mathematics, because they are useful for 
describing repeating phenomena such as waves and springs. 

The simplest way to construct periodic functions is to take a function with domain 
a non-degenerate interval, and extend it to all of R. The following lemma states that 
this type of construction works. 


Lemma 7.3.2. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
h=b—a. 


I. Let k: [a,b) — R be a function. Then there is a unique periodic function 
f: RR with period h such that f\{ay) =k. 

2. Let g: [a,b] > R be a function. Suppose that g(a) = g(b). Then there is a 
unique periodic function f : R — R with period h such that Filia) = g. 


Proof. 


(1) We first show existence. Let f: IR — R be defined as follows. Let x € R. By 
Exercise 2.6.14 (1) we know that there is a unique n € Z such that a+ (n—1)h<x< 
a-+nh, Hence x — (n—1)h € [a,b). We then let f(x) = k(x— (n—1)h). If x € [a,b), 
then by the uniqueness of n it must be the case that n = 1, and hence f(x) = k(x). 
Therefore f|jq,») =k. Next, we note that because a+ (n—1)h <x <a-+nh, then 
a+ |(n+1)—Ih<x+h<a+(n+1)h. It follows by the uniqueness of 7 that 
f(x+h) is defined by f(x+h) = k(x+h-—|[(n+1)— 1h) = k(x— (n—1)h) = f(a). 
Hence f is periodic with period h. 
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We now show uniqueness. Let p: IR — R be a periodic function with period h 
such that p| (a,b) = k. Let y € R. By Exercise 2.6.14 (1) we know that there is a unique 
m € Z such that a+ (m—1)h < y <a+mh. Hence y—(m—1)h € [a,b). Using the 
fact that p and f are periodic with period h, and that p| lab) =k = Flap)» we see that 
P(y) = p(y— (m—1)h) = k(y— (m— 1)h) = f(y — (m— 1)h) = f(y). Hence p= f, 
and we conclude that f is unique. 


(2) The proof of this part of the lemma is very similar to the proof of Part (1), 
and we omit the details. 


Lemma 7.3.2 is usually taken for granted, and often one just says informally 
“extend the function k periodically with period h” or a similar phrase. However, the 
proof of this lemma uses Exercise 2.6.14 (1), which relies upon Corollary 2.6.8 (1), 
which relies upon the Archimedean Property (Theorem 2.6.7), which in turn relies 
upon the Least Upper Bound Property of the real numbers. Hence, the ability to 
construct periodic extensions is not a trivial matter, even if it appears simple intuitively. 

We can now use Lemma 7.3.2 to make the following definition. 


Definition 7.3.3. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
h=b-—a. 


1. Let k: [a,b) — R be a function. The periodic extension of k is the unique 
periodic function f: R — R with period h such that f I[a,b) =k. 

2. Let g: [a,b] — R be a function. Suppose that g(a) = g(b). The periodic 
extension of g is the unique periodic function f: IR — R with period h such 
that Fla.) = g. A 


The following lemma relates the behavior of a function to the behavior of its 
periodic extension. 


Lemma 7.3.4. Let [a,b] C R be a non-degenerate closed bounded interval. Let 
g: [a,b] — R be a function. Suppose that g(a) = g(b). Let f: R > R be the periodic 
extension of g. 


1. If g is continuous, then f is continuous and bounded. 
2. If g is differentiable, and if g'(a) = g'(b), where g'(a) and g'(b) are one-sided 
derivatives, then f is differentiable, and f" is the periodic extension of g'. 


Proof. Let h=b—a, so that f has period h. 


(1) Suppose that g is continuous. Let x € R. By Exercise 2.6.14 (1) we know 
that there is a unique n € Z such that a+ (n—1)h <x <a-+nh. Hence x € (a+(n— 
2)h,a+nh). By Lemma 2.3.7 (2) there is some 5 > 0 such that (x— 6,x+6) C 
(a+(n—2)h,a+nh) 

Using the notation of Exercise 3.3.11, observe that [a+ (n—2)h,a+(n—1)h| = 
[a,b] + (n—2)h and [a+ (n— 1)h,a+nh] = [a,b] + (n— 1h. It follows from Exer- 
cise 3.3.11 that f|(a+(n—2)na+(n—1)h] NE flja+(n—1)h,atnn] ae Continuous. Using the 
Pasting Lemma (Lemma 3.3.10) we see that f [a+ (n—2)h,a-+nhl] is continuous. By Exer- 
cise 3.3.2 (2) the restriction of f|ja4(n—2)h,a+nn] t0 (x — 6,x + 6) is continuous, which 
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means that f|, 5.4) is continuous. Because RN (x— 6,x+6) = (x—6,x+6), it 
follows from Exercise 3.3.2 (1) that f is continuous at x. 

We know by Corollary 3.4.6 that g is bounded. For each x € R, we know that 
f(x) = g(y) for some y € R. It follows that f is bounded. 


(2) The proof of this part of the lemma is very similar to the proof of Part (1) of 
the lemma, though we replace Exercise 3.3.11 with the Chain Rule (Theorem 4.3.3), 
and we replace Exercise 3.3.2 (2), Exercise 3.3.2 (1) and the Pasting Lemma with 
Exercise 4.2.3 (5), Exercise 4.2.3 (1) and Exercise 4.3.7 (1), respectively; we omit the 
details. 


Prior to defining arcsine, we have one more preliminary step, which is the defi- 
nition of the number 7. Informally, we are used to thinking of 7 as the ratio of the 
circumference of a circle to its diameter, and also as the number that appears in the 
well-known formula for the area of a circle in terms of its radius. We are also used 
to using the decimal expansion of 7, which is 7 = 3.14159.... However, as much 
as our intuitive conception of 7 is correct, simply saying that 7 is the ratio of the 
circumference of a circle to its diameter is not a rigorous definition, because it would 
first be necessary to prove that this ratio is the same in all circles. This geometric fact 
about circles will be proved in Section 7.4; at present we will take the quicker route of 
defining 7 as an improper integral, which is less geometrically appealing, but allows 
us to proceed more directly to the definition of the sine and cosine functions. 

To see that the following definition makes sense, we need some preliminary 
observations. Using some facts about continuity that we have encountered, the reader 


can verify that the function f: (—1,1) — R defined by f(x) = =(1 —32)-2 


for all x € (—1,1) is continuous. It then follows from Theorem 5.4.11 that f is 
locally integrable. Finally, Exercise 6.4.5 (2) implies that the improper integral in the 
following definition is convergent. 


Definition 7.3.5. The number 7 is defined by 


— 
1—x2 


: 1 
T=2 [ Viz dx. A 
We will not give a rigorous calculation of the numerical value of 7 as 3.14159... 
at this point, both because we will not need it, and because we do not yet have the tools 
to do so. We will give a proof that 7 is an irrational number in Theorem 7.4.5, and a 
calculation of the first few digits of the decimal expansion of z in Example 10.4.17. 
We are now ready to define the arcsine function, from which we will then define 
the sine function. 


Definition 7.3.6. The arcsine function is the function arcsin: [—1,1] — R defined 
by 


x. i 
arcsinx = ———- dt 
[ V1l-? 


for all x € [—1, 1]. where the integral is improper when x = 1 and x = —1. A 
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The fact that the improper integral in Definition 7.3.6 is convergent when x = | is 
due, as remarked above, to Exercise 6.4.5 (2); it is left to the reader to verify that the 
improper integral is also convergent when x = —1. 

We now state those properties of arcsine that we will need for our construction of 
sine. 


Lemma 7.3.7. 


1. arcsin(—x) = —arcsinx for all x € [—1, 1]. 
2. arcsin0 = 0, and arcsin1 = 4, and arcsin(—1) = —4. 
3. The function arcsin is differentiable on (—1,1), and arcsin' x = —4~ for all 


1—x? 
x€(-1,1). 
lim arcsin’x=cc and lim arcsin’ x =, 
x17 x--1 


. The function arcsin is continuous. 


. The function arcsin is strictly increasing. 
. The function arcsin: [—1,1] + [—$, 4] is bijective. 


NAM A 


Proof. 


(1) Let x € [—1, 1]. Then using the substitution uv = —t, which implies du = —dt, 
we see that 


dt 


ae 2 are ee 
aresin(—x) = [ eae ha! 1) 


= —arcsinx. 


Xx 1 

ae 
For the sake of brevity we have used the standard notation for substitution found in 
calculus courses. To make this substitution rigorous, it would be necessary to use 
Integration by Substitution for Definite Integrals (Theorem 5.7.4) when x € (—1,1), 
and Integration by Substitution for Type 2 Improper Integrals (Theorem 6.4.12) when 
x = 1 and x = —1, where these theorems are applicable because the functions involved 
are continuous and differentiable as needed; we omit the details. 


(2) This part of the lemma follows immediately from the definition of arcsin, the 
definition of 2 and Part (1) of this lemma. 


(3) This part of the lemma follows immediately from the definition of arcsin and 
the Fundamental Theorem of Calculus Version I (Theorem 5.6.2). 


(4) This part is left to the reader in Exercise 7.3.4. 


(5) The continuity of arcsin at all points in (—1,1) follows from Part (3) of this 
lemma and Theorem 4.2.4. The definition of arcsin 1 as an improper integral means 
that 


1] | 
arcsin | = | dt = lim [ ——— dt = lim arcsiny. 
0 V1—-f yol- Jo V1 —-?? yol- d 
It follows from the one-sided analog of Lemma 3.3.2 that arcsin is continuous at x = 1. 
A similar argument works for x = —1, and we omit the details. 
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(6) We see by Part (3) of this lemma that arcsin’ x = ae > 0 for all x € (—1,1). 


Using Part (5) of this lemma, we can apply Theorem 4.5.2 (2) to arcsin, and it follows 
that arcsin is strictly increasing. 


(7) This part of the lemma follows from Exercise 4.6.3 (1) together with the 
previous parts of this lemma. 


We are finally ready to give the definition of the sine and cosine functions. The 
following definition makes sense because of Lemma 7.3.7 (7), which says that the 


function arcsin: [—1, 1] > [- a Z| is bijective, and hence has an inverse function. 


Definition 7.3.8. The sine function is the function sin: IR — R defined as follows. 
Let f: [—%, 32] — R be defined by 


F(x) arcsin™!(x), if x € [-4, 3] 
PD oe es 

arcsin"'(%—x), ifxe [%, 34]. 
The function f is well-defined, because arcsin™! (4) = aresin”! (x - Z). Using the 
fact that f (#) =aresin”! (7 — 32) =arcesin-'(—%) = f(—¥), we can apply Defi- 
nition 7.3.3 (2) to the function f, and we let sin: R — R be the periodic extension of 


f. A 
The following theorem states a few of the basic properties of sine. 
Theorem 7.3.9. 


1. sin0 =0, and sin(¥) = 1, and sin(—¥) = -1. 

2. sin is periodic with period 27. 

3. sin(a—x) = sinx for allx ER. 

4. 0<sinx <1 forallx€ (0, zy, and —1 <sinx <1 for all x € (-§, z), and 
—l<sinx< 1 forallx ER. 


Proof. Left to the reader in Exercise 7.3.5. 


We now turn to the cosine function. In the approach to sine and cosine based 
upon the unit circle, we define each of these two functions independently, whereas 
here we define cosine in terms of sine. The following definition makes sense by 
Theorem 7.3.9 (4). 

We use the common notation sin 
for cosine. 


> x as an abbreviation for [sinx]*, and similarly 


Definition 7.3.10. The cosine function is the function cos: R — R defined by 


V1l—sin?x, ifxe [—% +2an, ¥ +2zn] for some n € Z 
cOsx = = . 
—VJ/1-sin?x, ifxe [5 +2an, 33 + 2an] for some n € Z. A 


The following theorem states a few of the basic properties of cosine. This theorem 
follows immediately from Definition 7.3.10 and Theorem 7.3.9, and we omit the 
details. 
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Theorem 7.3.11. 


1. cos0 = 1, and cos(#) = 0, and cos(—#) =0. 

2. cos is periodic with period 27. 

3. 0 <cosx < 1 forall x € (—§, ¥), and 0 < cosx <1 forall x € [—$, ¥], and 
—1<cosx <1 forallx ER. 

4. sin?x+cos?x = 1 forallx ER. 


Next, we look at the derivatives of sine and cosine. The idea is to use what we saw 
about derivatives of inverse function in Section 4.6, though we have a slight problem 
because the function arcsin is not differentiable at x = 1 and x = —1. Fortunately, we 
will be able to resolve this problem by using Theorem 6.3.9. 


Theorem 7.3.12. 


1. The function sin is differentiable and sin’ = cos. 
2. The function cos is differentiable and cos’ = — sin. 


Proof. We prove Part (1), leaving the remaining part to the reader in Exercise 7.3.6. 


(1) First, we look at sin restricted to (-§, E). By definition we know that sin 
restricted to (-% 5) equals arcsin os uae Lemma 7.3.7 (3) we know that el is 


272 
Toa for all x € (—1, 1). Because Tit 7 


differentiable on (—1,1), and arcsin’ x = 


0 for all x € (—1,1), it follows from Theorem 4.6.4 (3) that sin restricted to (— 3, $) 
is differentiable. Moreover, if x € (-§, z), Theorem 4.6.4 (4) implies that 
1 1 


sin’ x = [arcsin™!]' (x) = = — 
I) arcsin’(arcsin~!(x)) —_arcsin’ (sin x) 


1 3 
= i = V1-—sin’ x =cosx. 
V 1—sin? x 


Next, consider arcsin restricted to (—1, 1]. It follows from various parts of Lem- 
ma 7.3.7 that arcsin|(_1, 1] Satisfies the hypotheses of Theorem 6.3.9, and we then use 
that theorem to deduce that sin is differentiable on (-§, ZI, and that sin '(g = = 0, 
where sin’ ( 3) is a one-sided derivative. It then follows from Theorem 7.3.11 (1) that 
sin’ ( t)= = cos(5 . A similar argument shows that sin is differentiable on [- oe ie 
and that sin '(-4 5) =0= cos (— z); we omit the details. 


Because sin’ (4) = 0, where sin’ (4) is a one-sided derivative, it now follows from 


Exercise 4.3.7 (2) that sin is differentiable on [— 4, 37], and that sin’ x = — sin’ (a — x) 
for all x € [, 3t). If xe [%, 3), then 7 —x € |—4,%], and using the previ- 
ous paragraph and Theorem 7.3.9 (3), we see that sin’x = —/1— sin? (x — x) = 
—V/1-—sin?x =cosx. 

It follows from the above that sin’ (3%) = — sin’ (a — (3#)) = —sin’(—4) =0= 


sin’ (- E). We therefore use Lemma 7.3.4 (2) to deduce that sin is differentiable, which 
is Part (1) of this theorem, and that sin’ is the periodic extension of the restriction 
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of sin’ to [- a 32] . Because sin’ = cos on [-3. 32] , and because cos is the periodic 
extension of its restriction [- as 3] , then the uniqueness in Lemma 7.3.2 (2) implies 
that sin’ = cos on R. 


We conclude our discussion of the trigonometric functions with the following 
trigonometric identities, which can be proved geometrically without the use of calcu- 
lus, but which have a very nice proof using Theorem 7.3.12. 


Theorem 7.3.13. Let x,y € R. 


cos(x + y) = cosxcosy — sinxsiny. 
) = cosxcosy+ sinxsiny. 


AMR WN 
io) 
s) 
A 
& 
| 


Proof. We prove all parts of the theorem together. Let c € R. Let f: R — R be defined 
by f(w) = sinwcos(c+ w) — sin(c+w) cosw for all w € R. Using Theorem 7.3.12, 
the Product Rule (Theorem 4.3.1 (4)) and the Chain Rule (Theorem 4.3.3), we see 
that 


f'(w) = cosweos(c + w) — sinwsin(c + w) —cos(c-+w) cosw+sin(c +w) sinw =0 


for all w € R. It follows from Lemma 4.4.7 (1) that there is some C € R such that 
f(w) =C for all w € R. By Theorem 7.3.9 (1) and Theorem 7.3.11 (1) we see 
that C = f(0) = sinOcos(c +0) — sin(c+0)cos0 = — sinc. Hence sinwcos(c+w) — 
sin(c +w)cosw = —sinc for all w € R. If we let c = —x and w = x, we deduce 
that sinxcos0 — sinOcosx = —sin(—x), which implies that sinx = —sin(—x), and 
hence sin(—x) = —sinx. If we let c= x—y and w = y, we deduce that sinycosx — 
sinxcosy = —sin(x — y), which implies that sin(x — y) = sinxcosy — cosxsiny. 
Because this last equality holds for all x,y € IR, we then see that sin(x+y) = 
sin(x — (—y)) = sinxcos(—y) — cosxsin(—y) = sinxcosy + cosxsiny. 

By differentiating both sides of the equation sin(—x) = —sinx, and using the 
Chain Rule, we obtain — cos(—x) = —cos.x, and hence cos(—x) = cosx. Finally, by 
differentiating both sides of the equations sin(x + y) = sinxcosy+cosxsiny and 
sin(x — y) = sinxcos y — cosxsiny with respect to x (thinking of y as a constant), we 
obtain cos(x+y) = cosxcos y—sinxsiny and cos(x—y) =cosxcosy+sinxsiny. 


Reflections 


The method of defining the sine and cosine functions given in this section is 
surprisingly complicated, and it appears to be quite different from the way one 
sees these functions defined in trigonometry and precalculus courses, where the 
trigonometric functions are based upon the unit circle. In fact, the difference between 
the two approaches is more a matter of style (that is, a rigorous treatment in contrast 
to an informal one) than of substance. If one looks at the proof of Theorem 7.4.3, 
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it will be observed that the integral used to compute the circumference of a circle 
is the same integral (though with different limits of integration) as the one used to 
define the arcsine function in Definition 7.3.6. Essentially, the integral used to define 
arcsine is the rigorous replacement for using the length of an arc on the unit circle 
in the definition of sine and cosine. The lengthy details in the definition of sine and 
cosine reflect a number of technical complications that arise along the way: the fact 
that the integral is improper at x = 1, the need to define arcsine with the integral rather 
than sine, and the fact that we cannot obtain the entire sine function as the inverse of 
arcsine, and so we need to look at periodic extensions of functions. 

The above comments notwithstanding, there is one substantial difference between 
the way we think of the sine and cosine functions in the present section and in 
trigonometry and precalculus courses, which is that in those elementary courses we 
think of sine and cosine as functions of angles, and we look at the relation of sine and 
cosine to triangles, whereas at present we think of sine and cosine simply as functions 
R — R, with no mention of angles or triangles. If one wants to think of sine and 
cosine as functions of angles, then one can think of the number x in sinx and cosx 
as representing an angle measured in radians. It is important to stress that measuring 
angles in degrees has no place in calculus; degrees (and any measure of angles other 
than radians) are arbitrary, and do not work properly with derivatives and integrals. 

In addition to the rigorous definition of the sine and cosine functions using 
integrals, there is another widely used rigorous definition of these functions in terms 
of power series; such a definition is given in Exercise 10.4.13. The definition of 
sine and cosine in terms of power series is quicker and easier than the definition 
given in the present section, though the ease is only apparent, because it relies upon 
various facts about power series that need to be proved rigorously; the definition 
used in the present section, by contrast, relies only upon what we have seen so far 
in this text. Moreover, the definition of sine and cosine using power series has no 
direct relation—within the realm of real analysis—to the informal definition of these 
functions using the unit circle; the power series method simply produces functions 
that behave the way one would expect sine and cosine to behave, for example that 
they have the anticipated derivatives. (There is a relation between the series definition 
of sine and cosine and the unit circle via the complex numbers, but that is beyond 
the scope of this book.) For these reasons, we have included the more cumbersome 
definition in the present section, especially for the reader who might not make it to 
the sections on power series. 


Exercises 


Exercise 7.3.1. Let f: R— R be a function. Prove that if f is continuous and 
periodic, then f is bounded. 


Exercise 7.3.2. [Used in Exercise 7.3.5.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let h = b—a and let g: [a,b] — R be a function. Suppose that 
g(a+b—x) = g(x) for all x € [2]. Then g(b) = g(a+b—b) = g(a). By Exer- 
cise 7.3.2 (2) there is a unique periodic function f: IR — R with period h such that 
F\ja,b] = 8- Prove that f(a+b—x) = f(x) for allx eR. [Use Exercise 2.6.14.] 
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Exercise 7.3.3. [Used in Section 7.4 and Exercise 7.4.3.] Prove that 2 > 0. 
Exercise 7.3.4. [Used in Lemma 7.3.7.] Prove Lemma 7.3.7 (4). 
Exercise 7.3.5. [Used in Theorem 7.3.9.] Prove Theorem 7.3.9. Use only concepts 


and results stated prior to that theorem. [Use Exercise 4.6.3 (2) and Exercise 7.3.2.] 
Exercise 7.3.6. [Used in Theorem 7.3.12.] Prove Theorem 7.3.12 (2). 
[Use Exercise 4.4.7.] 


Exercise 7.3.7. [Used in Exercise 7.3.8.] Let x € R. 
(1) Prove that sin(F - x) = cosx, and that cos ( —x) = sinx. 
(2) Prove that sin(2x) = 2 sinxcos.x, and that cos(2x) = cos? 
1=1—2sin’x. 


Exercise 7.3.8. [Used in Example 10.4.17.] Prove that sin( 2) = 7 Use only what we 
have seen in this text; in particular, you may not use facts about triangles that we have 


x—sin? x =2cos?x— 


not proved. [Use Exercise 7.3.7.] 
Exercise 7.3.9. [Used in Exercise 7.3.10 and Exercise 10.4.13.] Using Theorem 7.3.12 
we see that sin” x = —sinx and cos” x = —cosx for all x € R. That is, both sin and 


cos satisfy the differential equation f”(x) + f(x) = 0 for all x € R. (A differential 
equation is simply an equation that involves derivatives of functions. The reader who 
is not familiar with differential equations should not be concerned—we will not be 
using any facts from the study of differential equations in this text.) We will see 
in Part (3) of this exercise that this differential equation, together with particular 
values for f(0) and f’(0), completely characterize the sine and cosine functions. Our 
approach follows [Spi67, Chapter 15]. For this exercise, use only what has been 
proved in this text; in particular, do not use facts about differential equations that you 
have seen elsewhere but we have not proved. 

Suppose that f: IR — R is twice differentiable, and that f”(x) + f(x) = 0 for all 
xeER. 


(1) Prove that if f(0) = 0 and f’(0) = 0, then f(x) = 0 for all x € R. The idea 
is to multiply both sides of the equation f”(x) + f(x) =0 by something that 
makes the left-hand side of the equation into the derivative of something. 

(2) Prove that if (0) = B and f’(0) = C for some B,C € R, then f(x) =Csinx+ 
Bcosx for all x € R. 

(3) Prove that sin is the unique function g: R — R that satisfies g(0) = 0, and 
g'(0) =1, and g” (x) + g(x) =0 for all x € R, and that cos is the unique function 
h: RR that satisfies h(0) = 1, and h'(0) = 0, and h(x) + h(x) = 0 for all 
xeER. 


Exercise 7.3.10. Use Exercise 7.3.9 (1) to give an alternative proof of Theo- 
rem 7.3.13 (1). 


7.4 More about z 


The definition that we gave for the number z in Definition 7.3.5 was technically 
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convenient from the point of view of defining sine and cosine, but it is not an entirely 
satisfactory definition, because it does not bear a direct resemblance to the standard 
approach to z that we all learned when we were young, which is to think of 7 as the 
ratio between the circumference and the diameter of a circle. In the present section we 
will show that z as we have defined it satisfies the expected geometric properties, and 
so no harm was done defining 7 as we did. We will also see a proof that 7 is irrational, 
which is not related to the issues of circumference and area, but is a fascinating fact 
about 7 that we have the tools to prove. 

Although we think of 7 as a number, specifically, the number 3.14159..., the 
symbol z also represents an idea that is more important than this numerical value. The 
number 7 is usually defined as the ratio of the circumference of a circle to its diameter. 
How do we know, however, that all circles have the same ratio of the circumference 
to the diameter? That question is often glossed over in elementary discussions of 7, 
but if this ratio were different in different circles, then the definition of 7 as this ratio 
would make no sense. We will now use tools from real analysis to prove that this 
ratio is the same for all circles, and that this ratio equals z as we defined it in terms 
of an improper integral in Definition 7.3.5. That fact is the idea that the number 7 
represents. The familiar formula C = 2D, where C is the circumference of a circle 
and D is its diameter, immediately follows from this fact about z. There is, of course, 
another famous formula involving circles and 7, namely, the formula A = mr, where 
A is the area of a circle, and r is its radius, and we will prove this formula as well. Our 
proofs will make use of the treatment of area and arc length in Section 5.9, and of 
improper integrals in Section 6.4. 

To compute the area of a circle, nothing about area beyond what was stated 
in Section 5.9 will be needed. By contrast, we will need one additional fact about 
arc length that was not discussed in that section, because when we compute the 
circumference of a circle via integration, we will need to deal with a function whose 
derivative is not bounded, and we will therefore need to use Type 2 improper integrals. 
In particular, we will need an improper integral version of Theorem 5.9.17, which 
we will see after the following lemma. The reader should first review the concept of 
rectifiability in Section 5.9. 


Lemma 7.4.1. Let [a,b] C R be a non-degenerate closed bounded interval, and let 

f: [a,b] = R be a function. Suppose that f is bounded, that f is continuous at b, that 

F\{a,s) #8 rectifiable for each s € (a,b) and that lim Li(f) exists. Then f is rectifiable 
: sb- 


and L>(f) = lim LS(f). 
s—b- 
Proof. Suppose that f is not rectifiable. Then 4, is not bounded above. 
Let L= lim L}(f). Then there is some 6 > 0 such that s € (a,b) and b—6 < 
s—b~ 


s <b imply |L‘(f) —L| < 1. Because f is bounded there is some M € R such that 
|f (x)| <M for all x € [a,b]. Because 4, is not bounded above, there is some partition 
P = {x0,*1,---,Xn} of [a,b] such that 


C(f,P) > |L|+1+V62+4M?. 
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Let t € (X,-1,4n) 1 (b—4,b). 

Choose some t € (Xn—1,Xn) such that |t —x,| < 6. Because x, = b, then b—6 < 
t <b. Then |Li(f) —L| < 1. By Lemma 2.3.9 (7) we deduce that |Li,(f)|—|L| < 1, 
and hence |Li(f)| < |L| +1. Let Q = {x0,21,...,Xn—1,¢,Xn}. Then Q is a refinement 
of P, and by Lemma 5.9.14 we know that C(f,Q) > C(f,P). 

Let R = {x0,*1,---,X%n—1,t}, which is a partition of [a,t]. Because | f(x)| <M for 
all x € [a,b], it follows that | f(x) — f(t)| < 2M. Then 


V bin #12 + [Ftm) — FOP < V52+ 4m, 


Therefore 


n—1 


C(F,Q) = Vy bi x1? + LF) — fe)? 
i= 


+ lta? +160 — FP + yee — #2 + fn) — FOP 
<C(f\ja,R) + V 8? +4M?, 
and hence 
C(fltas+R) + V8? + 4M? > C(f,0) > CUf,P) > |L| +14 V82-+4M2. 


It follows that C(f|ja4,R) > |L| +1. Because fi, is rectifiable, then L{(f) = 
lub 7p), ., and therefore Li(f) = C(fliaz,R) > |E| + 1. However, we saw previously 
that |Li,(f)| < |L| +1, which is a contradiction. We conclude that f is rectifiable. 
Therefore L?(f) is defined. 

We now show that L?(f) = iin L‘(f). Let € > 0. Because f is continuous at b, 


there is some 5; > 0 such that x € [a,b] and |x—b| < 6; imply | f(x) — f(d)| < é. 

Because L?(f) = lub Af, it follows from Lemma 2.6.5 (1) that there is a par- 
tition Z = {z0,z1,...,2n} of [a,b] such that L?(f) —§ < C(f,Z) < LP(f). Let 
n= min{ 6, ; 2 in —Zn-1}. 

Suppose that u € [a,b) and b—n <u <b. Then u © (Zy_1,Z,). Let X = 
{Z0,Z1,--+;Zn—1;U4;,Zn}. Then X is a partition of [a,b]. Because F\iccaa is rectifiable, 
Lemma 2.6.5 (1) implies that there is a partition Y of [a,u] such that Li(f) — § < 
Cf \iaujs¥) < La(f). Let W = X UY, and let V = WN [a,u]. Then W is a refinement 
of Z, and V is arefinement of Y. It follows from Lemma 5.9.14 that C(f,W) > C(f,Z) 
and C(f|{auj,V) = C(fliauj,¥)- Hence, using the definition of L?(f) and L¥(f) as 
least upper bounds, we see that L?(f) — § <C(f,W) < LE(f) and Li(f) — £ < 
C(FllausV) < La(f). Therefore Lat) —CF,W)| < § and |La(f) —C(fliausV)| < 

Because V = W1[a,u], and because the partition W has no points between u 
and z,, it follows that C(f,W) _ CF lfaujsV) { VJ [en ul? } (f (zn) f(u)P?. Hence 
Ic(f,W) - CH laa¥)| = J len —ul? + [f (zn) — f(w)]. By the choice of u we know 
that |w — b| <n. Hence |u — b| < § and |u — | < 6). From the latter we deduce that 
|f(u) — f(b)| < §- Observe that z, = b. Then 
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IC(f,W) -C(FltagsV)| = len —ul?+ Lf (zn) — f(w)]? < (E)'+ (2) 2 _ 


Finally, we see that 


Lief) —Cflians¥) +P lang, ¥) CU W) + CCF,W) — LA) 
< |Z are C(F lfaujsV )| 
+|C(fliaags¥) —C(F,W)| + |C(FW) - L2(f)| 


E 
<a+atae 


It follows that lim L3(f) = L2(f). 
sb- 


Theorem 7.4.2. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. Suppose that f is continuous on |a,b| and continuously 
differentiable on (a,b), and that f' is bounded on (a,s) for any s € (a,b). If /1+[f'?? 
is improperly integrable, then f is rectifiable and 


= [ iv ore. 


where the integral is improper. 


Proof. Suppose that \/1 + [f’|? is improperly integrable. 
If s € (a,b), then f| ia,s] Satisfies the hypotheses of Theorem 5.9.17, and it follows 
from that theorem that f [{a,s] is rectifiable and 


=f (rir open 


Because \/1 + [f’]2 is improperly integrable, then 


lim Li(f) = lim yi +pPax= [ \/1+[f'(x)]? dx, 
sb- sb- 


and so in particular lim L}(f) exists. 
sb 


By Theorem 3.4.6 the function f is bounded. By hypothesis f is continuous at b. 
We can therefore use Lemma 7.4.1 to deduce that f is rectifiable and 


Lilf) =| tim Ly =f 1+ [f' (x)? dx. 


We are now ready to look at circumferences of circles. Let (a,b) € R?, and let 
r € (0,cc). The circle of radius r centered at (a,b) is given by the equation (x —a)* + 
(y—b)*? =r’; this equation is just a restatement of the Pythagorean Theorem. The 
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following theorem shows that the ratio of the circumference to the diameter is the 
same for all circles, and that this ratio equals 7, as we defined it in Definition 7.3.5. 


Theorem 7.4.3. Let (a,b) € R?, and let r € (0,~). If C is the circumference of the 
circle of radius r centered at (a,b), and if D is the diameter of this circle, then 3 =f. 


Proof. Let C be the circumference of the circle of radius r centered at (a,b), and let 
D be the diameter of this circle. Then D = 2r. 

To compute C, we first solve the equation (x —a)* +(y—b)? =r’ for y, and we 
obtain y = b+ ,/r2 — (x—a)?, which yields two functions, namely, the one with 
the positive square root, which describes the upper semicircle, and the one with the 
negative square root, which describes the lower semicircle. Because of the top-to- 
bottom symmetry of the circle, in order to find the circumference of the circle, it will 
suffice to find the arc length of the upper semicircle and multiply it by 2. (A proof 
of this fact makes use of Exercise 5.9.11 and Exercise 5.9.12; the details are left to 
the reader.) Because of the left-to-right symmetry of the upper semicircle, in order to 
find the circumference of the circle, it will suffice to find the arc length of the upper 
right quarter of the circle and multiply it by 4. (A proof of this fact makes use of 
Exercise 5.9.14 and Exercise 5.9.13; again, the details are left to the reader.) 

Let h: [a,a+r] — R be defined by h(x) = b+ \/r? — (x—a)? forall x € [a,atr]. 
We will show that / is rectifiable, and it will then follow that C = 4L4*’(h). 

It is left to the reader to verify that h is continuous on [a,a+r] and continuously 
differentiable on (a,a+r). If s € (a,a+r), then by Exercise 3.3.2 (2) we know that 
h’ is continuous on [a,s] , and Corollary 3.4.6 then implies that h' is bounded on 

: ; = i: 
[a,s], and hence on (a,s). It can also be verified that \/1+ [h’(x)]* = Vr 
for all x € [a,a+r); again the details are left to the reader. We would like to apply 
Theorem 7.4.2 to the function /, and so we need to show that \/1 + [h’] is improperly 
integrable. 

Let g: [a,a+r] — [0,1] be defined by g(x) = *-* for all x € [a,a+r], and let 
f: (0,1) — R be defined by f(x) = Tine for all x € [0,1). It is left to the reader 


1—x?2 
to verify that f is continuous, that g is strictly increasing and differentiable, that 
g’ is integrable, that g(a) = 0 and g(a+r) = 1. Moreover, the reader can verify 


/ _ r _ / 2 
that f(g(x))g'(x) = ara V1+|A'(x)}* for all x € [a,a+r). We now use 
Integration by Substitution for Improper Integrals (Theorem 6.4.12) to deduce that 


1+ [h’|? = (f og)- 2g’ is improperly integrable if and only if f is improperly inte- 
grable, and if they are improperly integrable then 


[Vis worac= [ree ar= [ royar= [/ ogee 


By combining Exercise 6.4.5 (2) and Exercise 6.4.7 (2) we see that f is improperly 
integrable, which implies that ,/1-+ [h’]? is improperly integrable and 


a+r 1 r 1 1 
J1+ [roe =[ a las . 
| + [h'(x)]? dx f rae sk i 
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We can now apply Theorem 7.4.2 to the function h, and it follows that h is 
rectifiable and 


a+r 1 
beagk (ea i, \/ 1+ [h' (x)? dx = a se 


1 1 
c=4r | ———— dx, 
0 V1—x2 


1 
ae 
D 0 V1—x2 


where the last equality holds by Definition 7.3.5. 


Therefore 


and we conclude that 


We now turn to the area of a circle. In the proof of the following theorem we will 
use Integration by Substitution and Integration by Parts. For the sake of brevity we will 
use the standard notation found in calculus courses for these techniques of integration, 
rather than the proper formulations stated in Theorem 5.7.4, Theorem 5.7.6 and 
Theorem 6.4.13. 


Theorem 7.4.4. Let (a,b) € R?, and let r € (0,). If A is the area of the circle of 
radius r centered at (a,b), then A = mr’. 


Proof. Let A be the area of the circle of radius r centered at (a,b). As in the proof 
of Theorem 7.4.3, the circle of radius r centered at (a,b) is given by the equation 
(x—a)*+(y—b)* =r’, which yields y = b+ \/r? — (x—)?. As was the case for 
the circumference, the symmetry of the circle allows us to find the area of the circle 
by finding the area of the upper right quarter of the circle and multiplying it by 4. 
(A proof of this fact makes use of Exercise 5.9.3, Exercise 5.9.4, Exercise 5.9.5 and 
Exercise 5.9.6; the details are left to the reader.) 

Let h: [a,a+r] — R be defined by h(x) = b+ \/r2 — (x—a)? forall x € [a,a+r]. 
As in the proof of Theorem 7.4.3, the function h is continuous. Hence h is integrable 
by Theorem 5.4.11. By Theorem 5.9.11 we know that the area of the upper right 


quarter of the circle is [“*" \/r? — (x — a)? dx. Hence 


a+r 
A=a/ 4/172 — (x—a)* dx. 


It is left to the reader to verify that the substitution w = *—* yields 


1 
A=ap | V1—w2dw. 
0 


To make this substitution rigorous, it would be necessary to use Theorem 5.7.4, though 
we omit the details, which are straightforward given that all functions involved are 
continuous and differentiable where needed. 
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We now use Integration by Parts, though we need to make use of the improper 
integration version of it, which is given in Theorem 6.4.13, because not all functions 
involved are defined for w = 1. Using u = V1 — w2 and dv = dw, we see that 


1 1 1 w2 
Py a eed Cre a 
[ 1 wedw = |w 1 w+ ere 


1 w 1 ] i 
=e dw = | a, | oe 
0 Vl—w2 0 Vl—w2 0 Vl—w2 
1 4 1 
-| —— dw | V1—w2dw. 
0 JVl—w2 0 


Solving for the original integral we obtain 


[view dw = 5 "a yh 
0 


1—w2 


Hence the area of a circle of radius r centered at (a,b) is given by 


Aaa? f° VIWaw = 47? . oo — 


du= tr’, 


: 1 
=r .2 ‘ 
0 Vl-uv 
where the last equality holds by Definition 7.3.5. 


Observe that in the proof of Theorem 7.4.4 we never actually evaluated any 
integrals. All we did was relate the area of a circle to its circumference, when both of 
these numbers are expressed as integrals. 

We conclude this section with a discussion of a very different aspect of 7, which 
is the fact that it is an irrational number. We follow, with added details, the proof 
in [Jef73, Appendix II], which says about this proof: “The following was set as 
an example in the Mathematics Preliminary Examination at Cambridge in 1945 by 
Dame Mary Cartwright, but she has not traced its origin.” Nor, it appears, was the 
proof published by its creator. (Another proof of the irrationality of 7, which is better 
known, is in [Niv47], though the exposition is quite terse; a variant of that proof, from 
[Spi67, Chapter 16], is seen in Exercise 7.4.3.) These proofs that z is irrational are all 
the type of proof where it is hard to get any good intuition about what is going on, 
and one simply follows the proof step by step to see how it goes. 

Our definition of 7 was given in Definition 7.3.5, but rather than using that 
definition directly in the proof that 7 is irrational, we use the following facts about 7, 
sine and cosine: 


(1) z>0; 
(2) the function sin is differentiable and sin’ = cos, and the function cos is 
differentiable and cos’ = — sin; 


(3) sin(4) = 1, and sin(—#) = —1, and cosO = 1, and cos(#) = 0, and 
cos(—%) =0; 
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(4) 0<cosx <1 forall x € [-5. uaF 

Property (1) is given in Exercise 7.3.3; Property (2) is given in Theorem 7.3.12; and 
Properties (3) and (4) are given in Theorem 7.3.9 and Theorem 7.3.11. The reader who 
has not read Section 7.3 should nonetheless be familiar, at least informally, with these 
properties of 7, sine and cosine, and can therefore read the proof of the following 
theorem without having to go back and read any of Section 7.3. This proof also uses 
the notion of factorials, which we defined in Example 2.5.12. 


Theorem 7.4.5. The number 2 is irrational. 


Proof. Let g: [—1,1] > R be defined by g(x) = cos (4) for all x € [—1, 1], and for 
each n € NU {0} let f,: [—1,1] — R be defined by f;,(x) = (-2)" for all x € [—1, 1]. 


n! 
It is left to the reader to verify that these functions satisfy the following properties; 


this verification makes use of Properties (2), (3) and (4) listed prior to this proof. 


(a) f is twice differentiable. 

(b) fr(1) = 0, and f,(—1) =0, and f,(0) 4 0, for all n € NU {0} such that 
n>1. 

(c) f,(1) =0, and f/(—1) = 0, for all n € NU {0} such that n > 2. 

(d) fo(x) = for all x € [—-1, 1], for some k € Z. 

(e) f(x) =r for all x € [—1, 1], for some r € Z. 

(f) f" = pafn—1 + 9nfn—2 for some Pn, gn € Z, for all n € N such that n > 2. 

(g) 0 < fa(x) < 4 for all x € [-1, 1], for all n € NU {0}. 

(h) g is twice differentiable. 

(i) g(1) =0, and g(—1) = 0, and g(0) 4 0, and g’(1) — g'(—1) = 2 for some 
cE Z— {0}. 

(jj) g’ = aa for some d € Z— {0}. 

(k) 0< g(x) <1 forx€[-1, 1]. 


For the rest of this proof we will use only the above properties of g and f,,, and not 
the definitions of these functions. 


; cat cd Gi cd ol: Ee ne 
By Property (j) we see that E g = 8" =g,and hence %g" is an antiderivative 


of g. 

Suppose that 7 is rational. Then by Property (1) listed prior to this proof and 
Lemma 2.4.12 (2) there are a,b € N such that 7 = §. 

Let n € N. By Properties (a) and (h) we see that each of f,, and g is differentiable, 
and hence by Theorem 4.2.4 we know that each of these functions is continuous. By 
Corollary 3.3.6 it follows that f,g is continuous, and hence by Theorem 5.4.11 we 
know that f,g is integrable. Let 


In = "1! [, In(x)g (x) dx. 


We now show that J, € Z for alln € NU {0}. The proof is by induction on n, using 
the variant of induction stated in Theorem 2.5.4; we start the induction at n = O rather 
than n = 1, but that is not a problem. In this proof by induction we will repeatedly use 
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the fact that 7 = and what we saw above about an antiderivative of g. We will also 
repeatedly use Integration by Parts for Definite Integrals (Theorem 5.7.6), though for 
the sake of brevity we will use the standard notation for Integration by Parts found 
in calculus courses. It is left to the reader to use Properties (a), (e), (f), (h) and (j) to 
verify that the hypotheses of Integration by Parts for Definite Integrals hold as needed. 


First, using Properties (d) and (i), we see that 


; I : cd ,|' 
Jy= aot! [ Joe da= at g(x) dx = ak Sel 


Hence Jo € Z. 
Second, this time using Properties (b), (i) and (e), we see that 


Hence J; € Z. 
Now let n € NU {0}, and suppose that n > 2. Suppose further that J, € Z for all 
k € {0,...,n—1}. Using Properties (b), (c) and (f) we see that 


Pre i i n cd , ” ip xO 3 
naa! [ pOde(a)dxr= a"! [mire] - fi mye) «| 


=a {of “DPabn-i2)+4nf-a(aal) ax 


cd : : 
=a =) {paae-' / , Fn—1(x)a(x) dx + qna’a”” i ‘ fn—2(x)8 (x) ax 


= bcd { PJn-1 + Gn@In—2} . 
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By the inductive hypothesis we know that J,_; € Z and J,_2 € Z, and it follows that 
J, € Z, which completes the inductive step. Hence J, € Z for all n ¢ NU {0}. 

Let n € NU {0}. By Properties (g) and (k) we see that 0 < a?"*! f,(x) g(x) < 4 
for all x € [—1, 1], and by Properties (b) and (i) we see that a?”*! f,,(0)g(0) > 0. Using 
Theorem 5.3.2 (3) and Exercise 5.5.7 we deduce that 0 < Jn _ ati 


Next, we claim that there is some k € N such that = < ag This claim can be 


proved most easily using sequences and sere which will be raiseussed: in detail in 
Chapters 8 and 9. eee specmcally, let c = a’. The final remark in Example 9.5.2 (2) 


= 


; atid 2 
implies that dim or = = in| nt = = 0, which means intuitively that 4 ac can be made as 


ni 
small as desired if nis made sufficiently large, and that in turn can be seen to imply 
the claim by using the definition of the convergence of sequences given in Section 8.2. 
Although Example 9.5.2 (2) comes later in the text, nothing in the present section is 
used in that example, and there is no fear of circular reasoning if we use that example 
in the present proof. For the reader who will not read Section 9.5, or who cannot wait 
until then to see all of the details of the present proof, an alternative (and ad hoc) 
proof of the claim that does not use sequences is found in Exercise 7.4.2, where again 
we think of c as a’. 

It follows from the above claim that 24_— aig 1. We saw above that 0 << J, < ie , 
and therefore 0 < J, < 1. On the other and, we saw that J; is an integer, which: isa 
contradiction to Theorem 2.4.10 (2). Hence 7 is irrational. 


It should be mentioned that 7 is not only an irrational number, but it is in fact a 
“transcendental number,” which means that it is not the root of any polynomial with 
rational coefficients. By comparison, the number /2 is also irrational, but it is not 
transcendental, because it is a root of the equation x2 —2 =0. A real number that is 
the root of a polynomial equation with rational coefficients is called an “algebraic 
number.” See [Ste04, Section 24.3] for a proof that z is transcendental. 

Finally, we note that there is a glaring omission in our treatment of 7 in this 
section, which is that we have not provided a method for computing the first few digits 
of the decimal expansion of z. One could in principle try to approximate 7 by drawing 
a circle very carefully, measuring the circumference and diameter, and dividing the 
former by the latter; of course, such a method is not very satisfying. There are many 
methods for computing the first few (or many) digits of the decimal expansion of 
zm; one such method, which is based upon series, will be seen in Example 10.4.17. 
For more information about computing the digits of the decimal expansion of 7, and 
about 7 in general, see [AH01] or [EL04]; see [BBBO04] for historical sources about 
7. 


Reflections 


The material in this section is the least central to the study of real analysis of 
any section in this book. Most introductory courses in real analysis do not give a 
thorough discussion of the trigonometric functions, and in particular do not discuss 
the number 7 in detail. However, given that we defined the trigonometric functions 
in Section 7.3, and therefore had to define z in the process, it would be a pity not 
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to discuss the interesting and familiar facts about 7 discussed in the present section, 
especially because they are proved using some results from real analysis that were 
seen previously in this text. 

That the proof of the irrationality of 7 is tricky is not surprising, but it is somewhat 
surprising that the proofs of the familiar formulas for the circumference and the area 
of a circle take as much effort as they do. In fact, the only reason it is surprising is 
that these familiar formulas are all too familiar—we learn them at a very early age, so 
early that we accept them as true simply because we were told so by our teachers. In 
particular, the reason that the proofs of the formulas for the circumference and the 
area of a circle are trickier than might at first be expected is because of the appearance 
of improper integrals in the definition of z and in the formula for the arc length of a 
circle. 


Exercises 


Exercise 7.4.1. In Section 7.3 we defined the number 7, and the arcsine function, 
using improper integrals. The purpose of this exercise is to show that it is possible to 
define z and arcsine using integrals but avoiding improper integrals. Hence the sine 
and cosine functions can be defined without using improper integrals. This alternative 
approach is slightly longer than the approach we used in Section 7.3, and hence we 
used the latter approach in the text for the sake of brevity. 

Let A: (—1,1) — R be defined by 


x, 1 
A(x) = | dt 
(*) 0 V1l—-# 


for all x € (—1,1). Observe that the function A is the restriction of the arcsine function 
as defined in Definition 7.3.6 to (—1, 1). Because the domain of A does not include 1 
and —1, then A is defined without improper integration. Using the same ideas as in 
the proof of Lemma 7.3.7, it can be verified that A(—x) = —A(x) for all x € (—1,1), 
and that A is differentiable and strictly increasing; we omit the details. 


(1) Prove that the function A is bounded. Do not use the arcsine function in your 
proof, because we are going to provide an alternative definition of that function 
subsequently in this exercise. 

(2) By Part (1) of this exercise, the Least Upper Bound Property and the Great- 
est Lower Bound Property we see that A((—1,1)) has a least upper bound 
and a greatest lower bound. Prove that glbA((—1,1)) = —lubA((—1,1)). 

[Use Exercise 2.6.5.] 

(3) We now let z be defined by 7 = 2 lubA((—1,1)), where this least upper bound 
exists as observed in Part (2). Prove that this definition of 7 is equivalent to 
Definition 7.3.5. [Use Exercise 4.5.9 and Exercise 4.5.10 (1).] 

(4) Using the previous parts of this exercise we now let the aresine function be 
the function arcsin: [—1, 1] — R defined by 
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2, ifx=-1 
arcsinx = 4 A(x), ifx€ (—1,1) 
E ifx=1. 


Prove that the function arcsin as defined above is continuous. (The other 
standard properties of arcsine can also be proved using this definition.) 
[Use Exercise 4.6.5 (1).] 


Exercise 7.4.2. [Used in Theorem 7.4.5 and Exercise 7.4.3.] Let c,p € (0,0¢). We 
prove that there is some n € N such that c <p. 


(1) By using Corollary 2.6.8 (1) there is some k € Z such that 


1 c2e 
“2 (=) * 


By choosing a larger value of k if necessary, we may suppose that k € N. Prove 
that 


2c 
(2c)! < 2 Dp. 
(2) Let n = 2c +k. Prove that < <p. 


Exercise 7.4.3. [Used in Section 7.4.] In this exercise we give an alternative proof, 
from [Spi67, Chapter 16], that 7 is irrational. As in the proof of Theorem 7.4.5, here 
too we need the definition of n!, given in Example 2.5.12, and we need only a few 
facts about 7, sin and cos, which this time are: 


(1) z>0; 
(2) the function sin is differentiable and sin’ = cos, and the function cos is differ- 
entiable and cos’ = — sin; 


(3) sinO = 0, and sin z = 0, and cosO = 1, and cosa = —1; 
(4) 0 <sinx < 1 for x € (0,72). 


Property (1) is given in Exercise 7.3.3; Property (2) is given in Theorem 7.3.12; and 
Properties (3) and (4) follow from Theorem 7.3.9 and Theorem 7.3.11. 

Parts (1), (2) and (3) of this exercise are preliminaries; the actual proof that 7 is 
irrational starts after that. Let n € N. 


(1) Let p: R— R be a polynomial function of the form 
P poly 
p(x) = Cpx” eww a ees lama aoe + Co_xr" 


for all x € R, where cy, Cn+1,.-.,C2n € Z. Prove the following properties. 
(i) p“ (0) =0 for all i € {0,...,n—1}. 
(ii) p (0) is an integer that is divisible by n! for alli ¢ {n,n +1,...,2n+1}. 
(iii) If p(1 — x) = p(x) for all x € R, then p“ (1) = 0 for all i € {0,..., 
n—1} and p“)(1) is an integer that is divisible by n! for all i € {n,n + 
1,...,2n+1}. 
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(2) Let f: R— R be defined by 


| —x)" 


n! 


f(x) = 


for all x € R. Prove that f satisfies the following properties. 
(a) f(0) =0, and f(5) >0, and f(1) =0. 
(b) / is infinitely differentiable. 
(c) f(2"*2) is the zero function. 
(d) f (0) and f(1) are integers for all i¢ NU {0}. 
(e) 0< f(x) < 4 for all x € (0,1). 
(3) Prove that there is a function g: R — R that satisfies the following properties. 
(f) g is twice differentiable. 
(g) g” =—1°g. 
(h) g(0) = 0, and g(1) = 0, and g’(0) = a, and g’(1) = —2. 
(i) 0 < g(x) < 1 for x € [0,1]. 

(4) Suppose that 7 is rational. Then by Corollary 2.4.14 we know that 27 is 
rational. Because 2 > 0, then 2” > 0. By Lemma 2.4.12 (2) we know that 
= ; tor some a,b € N. Let h: R — R be defined by 

b= y (-1ia" ‘bi [FD g— pO)! 
i=0 


Prove that h’ = 2a" f g. It follows that h is an antiderivative of 27a" fg. 


(5) Prove that za" fg satisfies the hypotheses of the Fundamental Theorem of 
Calculus Version II (Theorem 5.6.4), and then use that theorem to prove that 
1 n See ; 
[ rare eea)ae =P (-1y'ar6§ [f9) + 0]. 
0 i=0 
By Property (d) it follows that ie ma" f (x)g(x) dx is an integer. 
(6) It follows from Properties (e) and (i) that 0 < ma" f(x)g(x) < na" for all 


x € [0, 1]. By an argument similar to one used in the proof of Theorem 7.4.5, 
and also found in Exercise 7.4.2, there is some k € N such that ba < i. 
Prove that 0 < ie nak f (x)g(x)dx < 1. By Part (5) of this exercise, where 
the choice of n was arbitrary, we know that f na‘ f(x)g(x)dx is an in- 
teger, which is a contradiction to Theorem 2.4.10 (2). Hence 7 is irra- 
tional. [Use Exercise 5.5.7.] 


7.5 Historical Remarks 
The material in this chapter starts with logarithmic and exponential functions, then 


trigonometric functions, and ends with an additional look at the number 7. This order 
allows us to proceed from the less complicated material to the more complicated, 
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though historically this order is backwards. The number 7, or at least approximations 
to it, are found in many ancient cultures, for the understandable reason that it was 
important to find the circumference and area of a circle; trigonometric concepts 
(originally viewed geometrically, prior to the development of the function concept) 
also have their roots in the ancient world; logarithms are a much later invention. 
We will discuss the history of each of these three concepts separately, in the proper 
historical order. 


The Number x 


We are familiar with the number z from two famous formulas, namely, the formulas 
C =D and A = mr’. It was recognized very early historically that the ratio 5 is the 
same in all circles; it was also known early that the ratio 4 is the same in all circles. 
However, it was apparently less widely known that these two ratios were the same; 
Euclid (c. 325-c. 265 BCE), for example, did not state this equality in the Elements. 
The ancient Babylonians and Chinese, on the other hand, knew that the area of a circle 
is half the circumference times half the diameter, a fact that in essence says that the 
two ratios are equal. Archimedes (287-212 BCE), in The Measurement of the Circle 
of around 250 BCE, proved the equivalent fact that the area of a circle is equal to the 
area of a right triangle with base equal to the circumference and height equal to the 
radius. 

Mathematics originated in practical considerations, and before there was the- 
oretical discussion of the meaning and various uses of the number 7, there were 
approximations of its value via the ratio f. A number of ancient civilizations used 
the value of 3 as an approximation to 7. For example, in I Kings 7:23 it states “And 
he made the molten sea of ten cubits from brim to brim, round in compass, and 
the height thereof was five cubits; and a line of thirty cubits did compass it round 
about” (translation from [MM]), which would imply that 7 is 3. However, much better 
approximations were obtained quite early in the ancient world. 

The ancient Babylonians, in the period 1900-1600 BCE, had the approximation 
2 for z, and the ancient Egyptians, in the Rhind Papyrus of around 1850 BCE 
or earlier, had the approximation (¥). Both of these approximations for 7 are 
within 0.02 of the correct value. In ancient India, from 600 BCE or earlier, they had 


the approximation (2283) for z. The approximation \/ 10 was apparently known in 


ancient India no later than 150 BCE. The approximation /2 + /3 is attributed to 
Plato (427-347 BCE). 

Archimedes, in The Measurement of the Circle, gave the first systematic method 
for finding approximations fof 77, as opposed to previous approximations, which were 
arrived at by experimental methods. Archimedes’ method was to find the perimeters 
of regular polygons that are circumscribed about a circle and regular polygons that 
are inscribed in it, providing upper bounds and lower bounds for the circumference of 
the circle. He used regular polygons with up to 96 sides, leading to 7 being between 
37 and 34, the latter yielding the approximation 2 (which is so commonly used 
that it is sometimes mistakenly thought to equal 7). Archimedes’ method, with some 
variation, was the main method (at least in the West) for approximating 7 for almost 
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two millennia, and after Archimedes it was used with polygons with ever more sides 
to obtain ever greater accuracy. 

In 150, Claudius Ptolemy (c. 85-c. 165) had the approximation _ , which is better 
than 2 . In ancient China various ee to 7 were known. For example, the 
sonmsiatien 330° which is better than 2 a , was given by Liu Hui (c. 220-c. 280) 
in around 263, using inscribed polygons, though similarly to Archimedes providing 
both upper and lower bounds. Liu Hui was aided by the fact that he had the decimal 
system for writing numbers, which was not available in ancient Greece. An even better 
approximation to 2 was found by Zu Chongzhi (429-500), improving the accuracy 
of Liu Hui by two decimal places; this approximation appears to have been the best 
approximation for the next 800 years. In India, Aryabhata (476-550), also known as 
Aryabhata I, had the approximation Ree in Aryabhatiya of 499. 

Leonardo of Pisa (1170-1250), also known as Fibonacci, without making use of 
Archimedes’ work, used a 96-sided polygon to obtain the approximation aot Not 
much progress was made in approximating z in medieval Europe, though better results 
were obtained elsewhere. Ghiyath al-Din Jamshid Mas’ud al-Khashi (1390-1450), 
around 1430, used a polygon with 3 - 278 sides to arrive at an approximation that is 
good to 16 decimal places. Additionally, a number of series for 7 were developed 
in India by the 15th century, at least a yeas celle man such series were found 
in Europe. ve these series were 4 = =. = 5 +4 -—7 t+. , with the correction 
factor | = 1 5+% zt. eye AED 
V12(1 zal + i 73 apts) 

A new method for computing 7 was introduced in 1579 by Francois Viéte (1540- 
1603), who combined Archimedes’ method with trigonometry, and used a polygon 
with 3-2!” sides, to arrive at an approximation that is good to 9 decimal places. 
Better results were subsequently obtained with similar methods using polygons with 
more sides, eventually obtaining 39 decimal place accuracy in 1630. In the mid-17th 
century other variants of the method of inscribed and circumscribed polygons were 
made, with more complicated procedures for going from one polygon to the next; 
such approaches were taken, for example, by James Gregory (1638-1675) and René 
Descartes (1596-1650). 

A revolution in computing 7 took place in mid-17th-century Europe due to the 
introduction of methods based upon series and calculus, rather than approximating 
circles with polygons. First, there were infinite products for z by Viéte in 1593 
and John Wallis (1616-1703) in 1655, and a continued fraction formula for z by 
William Brouncker (1620—1684) in 1658. Series for 7 were found simultaneously 
with the advent of calculus. In 1665-1666 Isaac Newton (1643-1727) found the series 
n= 342405 — ty — dy — aly —---). Gottfried von Leibniz (1646-1716), in 
late 1673 or early 1674, found the power series a i arctangent function and used it 
to obtain the series z = 4arctan1 = 4(1 — z+ 5-9 t+. --), which was known earlier 
in India. 

Leibniz’ series for 7 converges very slowly, but a much faster converging series 
was found by John Machin (1680-1752) in 1706 using the formula 7 = 16 arctan 5 a 


4 arctan xo together with the power series for arctangent. Using this method, Machin 


that i janes the accuracy, and 7 = 
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obtained the first 100 decimal places of 7. Variants of this arctangent method were 
used until around 1970, first by hand, and eventually by computer. 

Although the symbol “7” is a Greek letter, the use of this symbol to denote the 
ratio of the circumference to the diameter of a circle is due not to ancient Greek 
mathematicians, but rather to William Jones (1675-1749) in 1706. However, given 
that the mathematical ideas about 2 in Jones’ book are due to Machin, as Jones 
himself said, it is possible that the symbol z is due to Machin as well. The use of 7 
as we now use it did not catch on from Jones’ work, and other symbols were used at 
the time, but Leonhard Euler (1707-1783) used z in a paper in 1736, and again in 
the influential textbook Introductio in analysin infinitorum of 1748, and the symbol 
became widespread thereafter. 

Euler provided a number of nice results involving z. For example, he proved in 


1736 that Yr, - = a a result that had eluded great mathematicians such as Leibniz, 
Jakob Bernoulli (1654-1705) and Johann Bernoulli (1667-1748). In Introductio in 
analysin infinitorum Euler also proved the formula e” = cosx +isinx in, which is 
very important in complex analysis, and from which the famous formula e’7 + 1 =0 
is deduced, showing a relation between the two important numbers 7 and e. 

The irrationality of 7 was suspected in the 15th century, but it was first proved 
only in 1766 by Johann Lambert (1728-1777). The transcendence of 2 was proved by 
Ferdinand von Lindemann (1852-1939) in 1882, a fact that implied that the ancient 
Greek quest to square the circle with straightedge and compass was doomed to fail. 

A number of algorithms for computing 7, by series approximations and other 
methods, were developed prior to the computer era. Around 1800 Carl Friedrich 
Gauss (1777-1855) invented a particularly fast algorithm for computing 2 based upon 
the arithmetic-geometric mean. The best approximation of 7 prior to the computer era 
appears to have been 1120 decimal places. With the aid of computers, the world record 
today is billions of decimal places. At first computers used older algorithms, such 
as Machin’s arctangent method, but eventually newer methods were developed, both 
algorithms specific to computing the decimal expansion of 7, including a rediscovery 
of Gauss’ algorithm, as well as a faster method of multiplying numbers, known as 
Fast Fourier Transform multiplication, which speeded up any algorithm that involved 
multiplication, including those for computing 7. 


Trigonometry 


Whereas today we think of trigonometry in terms of the six trigonometric functions, 
the subject started out rather differently. For example, the ancient Greeks, who 
originally did not have angle measure, did not use these functions. Nonetheless, 
Propositions 12—13 in Book II of Euclid’s Elements are equivalent to what we call the 
Law of Cosines, and as such represent trigonometric ideas. 

The need for something analogous to the trigonometric functions, and for the 
associated tables of values needed for calculations in an era without computing 
technology, arose in the study of astronomy because of the use of angles as coordinates 
for heavenly bodies, an idea due to the ancient Babylonians. Hipparchus of Nicaea 
(190-120 BCE), also known as Hipparchus of Rhodes, was among the first ancient 
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Greeks to use the Babylonian system of measuring angles by dividing the circle into 
360°, and he was probably the first person to compile a trigonometric table. This 
table, rather than giving values of a function as we think of it today, gave the lengths 
of chords subtended by arcs in a given circle (in a unit circle the length of the arc 
represents the angle via radian measure, and the length of the chord is twice the sine 
of half the angle subtended by the arc). 

The most important work on trigonometry in the ancient world was in Mathematiki 
Syntaxis of around 150 by Ptolemy. Because of its importance this work was referred 
to as Megisti Syntaxis, which means “Greatest Compilation,” and that title became 
al-magisti in Arabic, which was then Latinized into Almagest, the common name 
for the work. The Almagest is a treatise on astronomy, stating what we now call the 
Ptolemaic system, but it includes trigonometry for use in astronomical calculations. 
Ptolemy had a trigonometric table (also consisting of lengths of chords in a circle), 
and had a theorem that included as a special case the equivalent of our formula for 
sin(x — y). He also had the equivalent of our half-angle formula for sine, and the 
equivalent of the Law of Sines. Ptolemy constructed his trigonometric table using his 
geometric equivalents of such formulas. 

The sine function was invented in India, where, as in ancient Greece, trigonometric 
ideas were used for astronomical calculations. Indian trigonometry appears to have 
been influenced by ancient Greek work on this subject; later European trigonometry 
would, in turn, be influenced by the development of Indian trigonometry beyond what 
was imported from Greece. 

In the Siddhantas, including the Paitamaha Siddhanta of the 5th century and the 
later Surya Siddhanta, rather than using tables of the lengths of chords subtended by 
arcs (corresponding to angles), there were more convenient tables of lengths of half 
the chords of double the angles; for a unit circle such lengths would be the same as 
the values of the sine function we use today. Other trigonometric functions, such as 
cosine, arcsine, tangent and secant, were also considered. Our word “sine” derives 
from the Sanskrit word “jiva,’ which was rendered “jiba” in Arabic; Robert of Chester 
(who translated mathematics from Arabic into Latin in the 12th century) thought the 
word was “jaib,” which means bay or inlet in Arabic, and so he translated it as “sinus” 
in Latin, which means inlet, bosom or any welcoming fold (the fact that Arabic is 
sometimes written without vowels might have been the source of this error). 

Aryabhata, in the Aryabhatiya of 499, had a sine table. Varahamihira (505-587), 
in the Pancasiddhantika of 575, had a more accurate sine table than Aryabhata, as well 
as a cosine table, and some identities relating these two functions. Brahmagupta (598— 
670), in the Brahmasphuta Siddhanta of 628, had ideas that implied the Law of Sines. 
Bhaskara I (1114-1185), also known as Bhaskaracharya, showed the equivalent of 
the fact that the derivative of sine is cosine, as well as the equivalent of our formulas 
for sin(x+ y) and sin(x—y). 

The earliest equivalent of a tangent table appeared in China, in the 7a yen li of Yi 
Xing (683-727), who was influenced by Indian astronomy and trigonometry. 

Arab mathematics was influenced by both the ancient Greeks and the Indians. 
In trigonometry the Arab mathematicians at first used both chord tables as in the 
Almagest and sine tables as in India, though they eventually settled on sine tables. All 
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six trigonometric functions, viewed geometrically, were used in the Arab world by 
the 9th century. 

Similarly to the spread of the Indian place-value system, Indian trigonometry 
came to Europe via the Arab world, for example through the work on astronomy and 
trigonometry of Abu Abdallah Mohammad ibn Jabir al-Battani (c. 850-929), also 
known as Albategnius, whose Kitab al-Zij was translated into Latin. Mohammad 
Abu’ 1-Wafa Al-Buzjani (940-998) had all six trigonometric functions and relations 
between them, double-angle and half-angle formulas, and a sine table with angles at 
0.25° intervals and with accuracy equivalent to eight decimal places. The trigonomet- 
ric functions in India were defined in circles of arbitrary size, whereas in the Arab 
world they were generally defined using the unit circle, as we now do. The Treatise 
on the Quadrilateral by Nasir al-Din al-Tusi (1201-1274) was the first treatise on 
trigonometry (planar and spherical) in its own right, independent of astronomy. 

The first treatise on trigonometry in its own right in Europe was De triangulis 
omnimodis of 1464 by Regiomontanus (1436-1476), also known as Johann Miiller of 
Konigsberg, who may have been influenced by the work of al-Tusi. Regiomontanus 
had many results about right triangles, the Law of Sines (with proof), many examples 
of solving problems with triangles, and some results on spherical trigonometry. This 
work did not have the tangent function, though another work of Regiomontanus, 
Tabulae directionum of 1467, had it, as well as a sine table using sexagesimal numbers; 
in 1468 Regiomontanus had a sine table using decimals. 

The famous text of Nicolaus Copernicus (1473-1543), De revolutionibus orbium 
coelestium of 1543, contained not only his important work on astronomy, but also 
some sections on trigonometry, this material having been previously published sepa- 
rately in 1542. Copernicus’ work on trigonometry was possibly influenced by Ptolemy 
and Regiomontanus. Copernicus’ student Georg Joachim Rheticus (1514-1574) took 
ideas of Regiomontanus and Copernicus and his own ideas and wrote the extensive 
treatise Opus palatinum de triangulis (completed and published in 1596 after Rheti- 
cus’ death), which included tables for all six trigonometric functions; he defined sine 
and cosine in terms of right triangles, rather than in terms of a circle. 

Viéte, in works of 1571 and 1593, had tables of all six trigonometric functions, 
solved problems with triangles and had trigonometric identities. One of the identities 
Viéte had was sinx + siny = 2sin a COs 5 which can be rewritten as sin(A + B) + 
sin(A — B) = 2sinA cos B, which in turn was used to convert products of numbers into 
sums prior to the invention of logarithms. 

Prior to Euler the trigonometric functions were thought of not as functions in 
the modern sense but as lines (or the lengths of lines) related to the unit circle; 
Euler was the first person to view the trigonometric functions as functions per se. 
In 1739 Euler discussed harmonic oscillators, and the sine function was a solution 
of a differential equation; in 1743 he used the sine and cosine functions in his 
method for solving linear differential equations in general. All of this work led 
to the trigonometric functions being viewed as yet another type of transcendental 
function. Euler’s textbook Introductio in analysin infinitorum of 1748 gave the first 
systematic treatment of the trigonometric functions as we know them today. Euler 
introduced our modern definitions of sine and cosine in terms of the unit circle. He 
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had various trigonometric identities, including sin? x + cos? x = 1 and the formula for 
sin(x-+y). He obtained the Maclaurin series for sine and cosine, and also the identity 
e’* = cosx+isinx. 


Logarithms and Exponentials 


Logarithms and exponentials arose much later than the trigonometric functions, 
presumably because they did not appear naturally in a topic of interest to the ancient 
world in the way that trigonometry arose as a tool for astronomy. 

Thomas Bradwardine, from Merton College at Oxford, in 1328 used something 
that is equivalent to what we now call an exponential function in his attempt to resolve 
a matter relating to Aristotle’s views on force and resistance. Nicole Oresme (1323— 
1382) subsequently explored such functions. He did not have a satisfactory way of 
writing exponentials, but he seemed to understand how to manipulate exponents with 
both integer and fractional powers, and raised the issue of the meaning of irrational 
powers. 

In the late 16th century, developments in fields such as astronomy and naviga- 
tion led to the need for more accurate computing. For example, 15-place tables of 
trigonometric functions were published in 1596 and 1613. John Napier (1550-1617) 
invented logarithms strictly for the purpose of making multiplication and division 
computationally easier. He had worked on developing his logarithms for many years, 
and he was, at least in part, inspired to complete and publish his approach after 
hearing about the use of trigonometric functions to convert products into sums and 
differences, which was reported to him by someone who had apparently heard about 
it while visiting the observatory of Tycho Brahe (1546-1601). Napier published his 
approach in 1614, after which logarithms gained rapid acceptance as a computational 
tool. Napier’s work was particularly noteworthy because fractional powers and the 
exponential notation had not yet been developed, and the decimal system of notation 
for fractions, though developed, had not yet been widely accepted; Napier’s use of 
decimal notation was very influential in the widespread adoption of this system in the 
17th century. The idea that a table of exponents and powers of a number (for example 
2) allowed multiplication to be done via addition was known prior to Napier, but such 
a table had increasingly large gaps, and could not be used in practice. Napier had the 
idea of producing a table without such gaps using arithmetic and geometric sequences; 
his approach involved thinking about the motion of points on a line. Similar ideas 
were developed by Jost Biirgi (1552—1632) at the same time as Napier, though the 
latter published his ideas first. Napier’s definition of the logarithm of a number was 
not exactly the same as ours, and it was Henry Briggs (1561-1630) who reformulated 
Napier’s approach (initially in consultation with Napier), and developed the common 
logarithms (base 10) we know today. Briggs published a preliminary table of common 
logarithms in 1617, and an expanded version in 1624. Briggs’ table was completed 
and published by others in 1628, and these tables gained widespread use. 

Grégoire de Saint- Vincent (1584-1667), in 1647, showed some properties of the 
area under the curve xy = 1, and in 1649 Alfonso Antonio de Sarasa (1618-1667) 
observed that Grégoire de Saint-Vincent’s properties implied that the area function 
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for the curve y = : has the same addition-to-multiplication property as logarithm, 
which was close to understanding what we now express by saying that Inx is the 
antiderivative of — 1. 

Wallis, in his Avionetion infinitorum of 1656, was the first person to use, and 
understand, fractional exponents as we do today. 

Newton computed the power series for In(1 +.) in the mid-1660s, but he did not 
publish it at the time. The series was published by Nicolaus Mercator (1620-1687), 
not to be confused with the inventor of the Mercator projection Gerardus Mercator, in 
Eageriimnotece yy of 1668, where he used the observation of Sarasa about the curve 
y = =, as well as ideas of Wallis. Mercator used the term “natural logarithm,” and 
gave ‘the correct ratio between natural logarithms and common logarithms. 

Leibniz worked out, and Johann Bernoulli explicitly stated, the derivatives of 
logarithms. 

Euler’s Introductio in analysin infinitorum of 1748 gave the first systematic 
treatment of logarithms as we now know them. Euler was the first person to think 
of log,,x as the number y such that a” = x. He obtained the Maclaurin series for a* 
using the binomial series, and he then defined log, x in terms of a‘. He defined e as 
e=1+ i + x +++. He had the idea that e* = (1+ 4)’, where / is an infinitely large 
number, and this was Euler’s way of saying what we now phrase as e* = im Coie 
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Sequences 


8.1 Introduction 


In a typical calculus course, sequences are usually treated very briefly, and their role 
is primarily as a prelude to the study of series. In real analysis, by contrast, sequences 
assume a much more important role. We will certainly use sequences in our study 
of series in Chapter 9, but, as will be seen in the present chapter, we will prove 
some substantial and important theorems about sequences in their own right, such as 
the Monotone Convergence Theorem (Corollary 8.3.4) and the Bolzano—Weierstrass 
Theorem (Theorem 8.3.9). As was the case for the important theorems concerning 
continuity, derivatives and integrals that we saw in previous chapters, the important 
theorems concerning sequences rely upon the Least Upper Bound Property of the real 
numbers. 

In many real analysis texts the study of sequences precedes the study of limits of 
functions. In such an approach, the proofs of some of the important theorems involving 
continuity, derivatives and integrals make use of sequences, rather than directly using 
the Least Upper Bound Property. In this text we have placed the study of sequences 
after the study of continuity, derivatives and integrals, and have offered proofs for 
those topics that more directly make use of the Least Upper Bound Property, both to 
highlight the role of the Least Upper Bound Property, and to keep to a minimum the 
technical tools needed to study continuity, derivatives and integrals. Be that as it may, 
sequences are a central topic in real analysis, no matter where they are placed in a 
textbook. We will see a few applications of sequences in Section 8.4. 


8.2 Sequences 


The reader is familiar, at least informally, with the notion of a sequence of real 
numbers, for example the sequence 


? 


1 
oe 


ALS 
ool 


1 
2? 


E.D. Bloch, The Real Numbers and Real Analysis, DOI 10.1007/978-0-387-72177-4_ 8, 399 
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Intuitively, a sequence of real numbers is a collection of real numbers of which there 
is a first, a second, a third and so on, with one real number for each element of N. It is 
important to distinguish between the term “sequence” and the related but not identical 
term “series,” which is the sum of a sequence, for example 


1401 
2°4°8° 16— 


In colloquial usage, the words “sequence” and “series” are often used interchangeably, 
but in mathematical terminology the two concepts are distinct, and these two words 
should be used in their precise meanings. (Using mathematical terminology, the final 
playoffs in American baseball should be called the “world sequence” rather than the 
“world series.’’) 

Although we think of a sequence informally as a collection of numbers of which 
there is a first, a second, a third and so on, that is not a rigorous definition—any 
statement with “etc.” is not entirely rigorous. The basis of the following definition is 
that we use the natural numbers as the model set of which there is intuitively a first, 
second, third and so on. Then, we can select a first, second, third and so on, elements 
of an arbitrary set by using a function from the natural numbers to that set. In other 
words, although we informally write a sequence of real numbers as a},a2,..., the 
formal definition of such a sequence is a function f: N — R, where we think of f(1) 
as the first element of the sequence, of f(2) as the second element of the sequence 
and so on; that is, we can think of the sequence as given by a, = f(n) for alln EN. 
As seen in the following definition, there can be sequences in any non-empty set. 


Definition 8.2.1. Let A be a non-empty set. A sequence in A is a function f: N— A. 
If f: NA is a sequence, and if a; = f(i) for all ic N, then we will write either 
a|,42,43,... OF {an} n—1 to denote the sequence. Each number a, where n € N, is 
called a term of the sequence {a,},"_,. A sequence in the set R is also called a 
sequence of real numbers. A 


It is important to recognize that a sequence is not just a set of elements, but is 
a countably infinite collection of elements in a given order. A set, by contrast, even 
if it is countably infinite, does not have an order to its elements. In other words, 
we distinguish between a sequence {a,},_,, and the set of terms of the sequence 
{ay | n € N}. For example, if we let {a,}*_, be the sequence {(—1)"}"_,, then 
{an},,_1 has infinitely many terms, but the set {a, | n © N} = {—1,1} has only two 
elements. Even if no two terms of a sequence are equal to each other, the sequence is 
still not the same as its set of elements. For example, the two sequences 5, ts 7 i: ve 
and ip 5; 7 é; ... are different as sequences, but they have the same sets of elements. 

In Section 3.2 we discussed the notion of the limit of a function as x — c, using 
the e—6 definition. We now give the analogous definition of the limit of a sequence. 
Actually, this definition more closely resembles the definition of Type 1 limits to 
infinity discussed in Section 6.2 than the definition of ordinary limits of functions 
seen in Section 3.2. However, it is not assumed in the present section that the reader 
is familiar with Section 6.2, except as a useful analogy for the final definition in this 
section. 
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Informally, as is often stated in calculus courses, the intuitive idea of a limit of a 
sequence {a,};"_, as n goes to infinity is that the value of a, gets closer and closer to 
a number L as the value of n gets larger and larger. Of course, not every sequence has 
a limit. It is important to stress that when we say “n goes to infinity” we mean only 
that n gets larger and larger; there is no real number “co” to which n is getting closer 
and closer. 

As we did for limits of functions, we measure “arbitrary closeness” with an 
arbitrarily chosen positive number, often denoted with a symbol such as €. We 
rephrase the first part of the expression “the value of a, gets closer and closer to a 
number L as the value of n gets larger and larger” by using € to denote our measure of 
closeness. The crucial idea of a limit of a sequence existing is: if, for every possible 
choice of € > 0, no matter how small, we can show that for all n sufficiently large, the 
value of a, will be within distance € of L, then we will say that the limit of {an} 1 
is L. We will use N € N to denote the measure of largeness of n. Then if for each 
possible choice of € > 0, no matter how small, we can show that there is some N € N 
such that for all n at least as large as N, then a, will be within € distance of L, we 
will say that the limit of {a,}/"_; as n goes to infinity is L. To say that a, is within 
distance € of L is to say that |a, — L| < €. We then see that the rigorous way to say 
“the value of a, gets closer and closer to a number L as the value of n gets larger and 
larger” is to say that for each € > 0, there is some N € N such that for all n € N such 
that n > N, it is the case that |a, — L| < €. As seen in Figure 8.2.1, where the values 
of a,,d2,... are represented by dots, the expression “for all n € N such that a > N, it 
is the case that |a, — L| < €” can be viewed graphically by saying that a, is within a 
band of width 2€ centered at L whenever n € N andn> N. 


Fig. 8.2.1. 


The reader will notice that our use of “e” in the above discussion is the same 
as our use of “e” in the discussion of limits of functions in Section 3.2, though we 
replaced the “6” (which will often be small when € is small) with an integer “N” 
(which will often be large when € is small). Other than this replacement of the usually 
small “65” with the usually large “N,” the following definition is virtually the same 
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as Definition 3.2.1. In other words, limits of functions and limits of sequences are 
different in that they deal with functions that have different domains (open intervals in 
R versus the natural numbers), but they work the same way in the codomain (which 
is R in both cases). 


Definition 8.2.2. Let {a,};, be a sequence in R, and let L € R. The number L is 
the limit of {a,}"_,, written 

lim a, = L, 

n—0o 


if for each € > 0, there is some N € N such that n € N andn > N imply |a, —L| < €. 
If lim a, = L, we also say that {a,};"_, converges to L. If {a,};_, converges to some 
n—0oo 


real number, we say that {a,};_, is convergent; otherwise we say that {a,}>, is 
divergent. 


Similarly to the definition of the limit of a function, here too the order of the 
quantifiers is crucial. The definition of the limit of a sequence could be written in 
logical symbols as 


(Ve > 0)(AN EN)[(n © NAn=N) = lan —-L| < €]. 


The order of the quantifiers cannot be changed. If we want to prove that lim a, = L, 
n—-oo 


the proof must start by choosing an arbitrary € > 0. Next, after possible argumentation, 
a value of N € N must be given, where N may depend upon €. We then choose an 
arbitrary n € N such that n > N. Finally, again after possible argumentation, we must 
deduce that |a, — L| < €. It is important that the arbitrary choices are indeed arbitrary. 
A typical proof that im a, = L must therefore have the following form: 


Proof. Let € >0. 
(ereumenwion) 
a EN be such that .... 
aidurtnentition) 
Since thatn € Nandn>N. 


(argumentation) 


Therefore |a, —L| < €. 


Such proofs are often called “e—N proofs.” If you feel comfortable with e—d 
proofs, then you should have no trouble with €-N proofs. 
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Just as the definition of limits of sequences is very similar to the definition of 
limits of functions, so too many of the lemmas, theorems and proofs in this section are 
very similar to the corresponding results about limits of functions found in Section 3.2. 

For our first lemma, observe that it is not stated in Definition 8.2.2 that the number 
“L” in the definition is unique. However, it turns out that if lim a, = L for some L € R, 


now 


then there is only one such number L. In other words, if a sequence has a limit, that 
means there is a single number L that a,, is getting closer and closer to as n gets larger; 
if there is no such number, then there is no limit. 


Lemma 8.2.3. Let {a,}"_, be a sequence in R. If lim a, = L for some L € R, then 


n—-oo 


L is unique. 


Proof. Suppose that lim a, = L; and lim a, = Lz for some L;,L2 € R such that 
n—co 


n—co 
L; # Lp. Let € = Fatal Then € > 0. Hence there is some N; € N such that n €¢ N 
and n > N; imply |a, —Li| < €, and there is some N2 € N that n € N andn > Np 
imply |a, —L2| < €. Let N = max{N,N2}. Then N > N, and N > Np, and hence 


|L1 —Lo| = |L1 — ay +. ay — La] < |L) — ay|+ ay — Ly| 
[Li — Lo| 
2 


which is a contradiction. We deduce that if lim a, = L for some L € R, then L is 


n—oo 


= |ay —Li| + |ay —L2| << €+€ =2€ =2 =|L, —Ly|, 


unique. 


Because of Lemma 8.2.3 we can refer to “the” limit of a sequence, if the limit 
exists. 


Example 8.2.4. In some parts of this example, we will first do some scratch work 
prior to the actual proof. As always, it is important to avoid confusing the scratch 
work with the proof. 


(1) Let c € R, and let ic ee be the constant sequence defined by a, = c for 
alln € N. We will prove that lim a, = c; we could write this limit as lim c = c. Let 
n—-eoo n—co 


€ > 0. Let N = 1. Suppose that n € N and n > N. Then 
lan —c| =|c—c| =O0<e. 
(2) We will prove that lim + = 0. 


Scratch Work We will work backwards for our scratch work. We want to find N « N 
such that n € N and n > N imply |+ - 0| < €, which is the same as i < €. So, it will 
be a good choice to pick some N € N such that * <E. 


Actual Proof Let € > 0. By Corollary 2.6.8 (2) there is some N € N such that n <E, 
Suppose that n € N and n > N. Then 


a oes 
nN : 
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(3) We will prove that jim a oy = s: 


Scratch Work We want to find N € N such that n € N andn > N imply 


— il< 


€, which is the same as | | < €, which is equivalent to Fag <E: which in turn 1s 


4n a3 
the same as ~ i< 4n? + 2. Solving for n we obtain n > 5 \ [+ aa, If 4 > 2, which is the 
same as € <4 x, then we will want to use N > 5 54/4 =—2. If 4 < 2, which is the same 


as € > 5, then we cannot use ,/ : — 2, but instead we observe that : <2<4n?+2 
for all n € N, so we can choose an arbitrary value for N. 


Actual Proof Let € > 0. There are two cases. First, suppose that € > 5. Let N= 1. 
Suppose that n € N andn > N. Then 


<€. 


ne 1 _ 
Qn2+1 2) | 


ie oe ee. 
4n24+2| 4n242 ~2 


Second, suppose that € < 5. By Corollary 2.6.8 (1) there is some N € N such that 


which implies, with some rearranging, that 


1 
—— = 
42 © 
Then 
nr eee 
2n2+1 2) 4n242 


(4) We will prove that {(—1)"}""_, is divergent. Suppose that lim (—1)” = L for 


some LE R. Let € = 5. Then there is some N € N such that n € N and n > N imply 
\|(—1)"-Ll < 5. Choose n 1,2 € N such that nj > N and n, is odd, and that ny > N 
and nz is even. Then 


2 = |(-1)—1| = |(-1)" —(-1)?| = |(-)" -E+2£-(-1)”| 


1 1 
<|(-1)" -L]+|L-(-1)"1<54+5=1, 


which is a contradiction. We conclude that the sequence is divergent. 0) 
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A very simple, but very useful, observation about limits of sequences is that if 
finitely many terms of a sequence are changed, it does not change whether or not the 
sequence is convergent, and if the sequence is convergent, it does not change the limit 
of the sequence. A proof of this fact is given in Exercise 8.2.3. 

For our next definition, recall from Section 1.7 or 2.2 the definition of a subset of 
R being bounded. 


Definition 8.2.5. Let {a,};_, be a sequence in R. The sequence {a,};_, is bounded 
above, bounded below or bounded if the set {a,, | n € N} is bounded above, bounded 
below or bounded, respectively. A 


By Exercise 2.3.11 we know that a sequence {a,};_, is bounded if and only if 
there is some M € R such that |a,,| < M for all n € N; it is always possible to choose 
M so that M > 0. 

In contrast to a number of other theorems and lemmas in this section, which have 
exact analogs for limits of functions, the following lemma has only a partial analog. 
Whereas it is possible for a limit of the form lim f (x) to exist even though the function 


f is not bounded, the discrete nature of the natural numbers leads to the fact that 
any convergent sequence is bounded, as we now prove. The closest analog of the 
following lemma for limits of functions is Lemma 3.2.7. 


Lemma 8.2.6. Let {a,}"_, be a sequence in R. If {an}; is convergent, then 
{an}, is bounded. 


Proof. Suppose that {a,}*_, is convergent. Let L = lim a,. Then there is some 
n-oo 


N €N such that n € N and n> N imply |a, —L| < 1, which by Lemma 2.3.9 (7) 
implies |a,|—|L| < 1, and hence |a,| < |L|+ 1. Let 


M = max{|ai|, |a2|,...,|an—1|,|Z|+ 1}. 


It then follows that |a,| <M for all k € N. Therefore {a,}_, is bounded. 


Although Lemma 8.2.6 shows that a convergent sequence is bounded, the converse 
is not true. For example, the sequence {(—1)”};°_, is clearly bounded, but we saw in 
Example 8.2.4 (4) that it is divergent. 

The following lemma is the analog of Lemma 3.2.8. The reader is asked in 
Exercise 8.2.6 to show that this hypothesis of boundedness in the following lemma 


cannot be dropped. 
Lemma 8.2.7. Let {a,},,_, and {bn}, be sequences in R. Suppose that lim ay = 0, 
and that {b,};_, is bounded. Then lim a,b, = 0. 


Proof. Let € > 0. Because {b,};"_, is bounded, there is some M € R such that 
|bn| <M for all n € N; we may assume that M > 0. Because lim a, = 0, there is some 
n—oo 


N €N such that n € N andn > N imply |a, —0| < 37. Suppose that n € N andn > N. 
Then 


E 
lanbn — 0] = |anbn| = |an|-|On| < Ti -M=e. 
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Example 8.2.8. By Example 8.2.4 (2) we know that lim ‘ = 0. The sequence 
{sinn},_ is bounded, because | sinx| < 1 for all x € R by Theorem 7.3.9 (4). Lem- 
ma 8.2.7 then implies that jim n nt — =0. © 


We now see that limits of sequences behave nicely with respect to the addition, 
subtraction, multiplication and division of the terms of sequences. 


Theorem 8.2.9. Let {ay,},_, and {b,};_, be sequences in R, and let k € R. Suppose 
that {a,};_, and {by}, are convergent. 


1. {an + bn}; _, is convergent and lim (dy + bn) = lim ay + lim dp. 
n—-oo n—-oco n—coo 

» {dn —bn}y_, is convergent and lim (dy — bn) = lim ay — lim bp. 
n—oo n— oo n—oo 


N—-0o 


2 

3. {kay}, is convergent and lim kay, = k lim ap. 
n—-oo I 

4 


. {dnby},_, is convergent and lim a,b, = [lim a,| - [lim by]. 
_ n—oo n—-oo n—0o 
lim an 
: an ) an _ N00 
a If lim bn #0, then {i 4 is convergent and jim nb, lines? 


n—co 


Proof. We will prove Part (4), leaving the rest to the reader in Exercise 8.2.11. 
The proofs of the various parts of this theorem are analogous to the proofs of the 
corresponding parts of Theorem 3.2.10. 


(4) Let L= lima, and M = lim D,. Let € > 0. By Lemma 8.2.6 we know that 
{bn},,_1 is bounded. Hence there is some B € R such that |b,| < B for all n € N. We 
may assume that B > 0. ren B+|L| > 0. There is some N; € N such that n € N and 
n> N imply |ap, — 4 < Bu and there is some N> € N such that n € N andn > N> 
imply |b, —M|< BH: Let N = max{N;,N2}. Suppose that n € N andn > N. Then 

|anby —LM| = |anby — byL + b,L — LM| < |bp| + |ay —L| + |L|- |b, —M| 
E — 
B+\|L| — 


E 
<B. + |L|- 
B+|L| 


Theorem 8.2.9 has both theoretical and practical uses, a simple example of the 
latter being the following example. 


Example 8.2.10. We will prove that lim {7 = 1. Although it would be possible to 


provide an €-N proof for this limit, we will use Theorem 8.2.9 to give an easier proof. 
Observe that if n € N, then 
n 1 


ntl 14! 


We saw in Example 8.2.4 (1) (2) that lim 1 = 1 and lim i = 0. We can then apply 
n—-oo n—-oo 
Theorem 8.2.9 (1) (5) to deduce that 
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Our next result is the analog of Theorem 3.2.13, with the one modification that it 
is sufficient if the term-by-term comparison of the two sequences starts after finitely 
many terms. 


Theorem 8.2.11. Let {a,}>_, and {b,};_, be sequences in R. Suppose that there is 
some N €N such thatn € N andn>N imply an < bn. If {an}; and {bn}; are 
convergent, then lim ay, < lim bp. 


n—-oo n—-oo 


Proof. Suppose that {a,};"_, and {b,};_, are convergent. Let L = lim a, and M = 
Ui 


lim b,. Suppose that M < L. Let € = tM Then € > 0. Hence there is some N; € N 
n—-eoo 

such that n € N andn > N, imply |a, — L| < €, and there is some N € N such that 
n€N andn> Np) imply |b, — M| < €. Let P= max{N,N1,N2}. Then jap —L| < € 
and |bp — M| < €. It follows that L—€ < ap <L+e€andM—e<bp<M-+e, and 
hence 


L-M_L+M_, L—M 
BD 2 


bp<M+e=M+ =L—€<ap, 


which is a contradiction to the fact that a, <b, for alln € N. Therefore L < M. 


The following theorem provides a convenient way to find the limit of a sequence 
by “trapping it” between two sequences that have limits that can be dealt with more 
easily. 


Theorem 8.2.12 (Squeeze Theorem for Sequences). Let {an}; ), {bn};_, and 


{cn};,_1 be sequences in R. Suppose that there is some N € N such that n € N and 
n>N imply an < bn < Cn. If {an}; and {cn}, are convergent and lim ay = 
n—0°o 


lim Cp, then {by}; is convergent and lim by = lim ay = lim cp. 
noo ~ noo noo n—-oo 


Proof. Suppose that {a,};_, and {c,}>"_, are convergent and lim a, = lim cy. Let 


n—-oo noo 


L= lim a, = limc,. Let € > 0. Then there is some N; € N such that n € N and 
n—-oo 


n-eoo 
n > N, imply |a, —L| < €, and there is some N2 € N such that n € N andn > Np 
imply |c, —L| < €. Let P = max{N,N,,N2}. Suppose that n € N and n > P. Then 
an < by < Cn, and lay —L| < € and |cy, —L] < €. It follows that L—€ <a, <L+e 
and L—€ <c, <L+¢€, and hence 


L-—E <a) <by<y <L+E. 


Therefore |b, — L| < €. 


The following useful example is, in part, an application of the Squeeze Theorem 
for Sequences (Theorem 8.2.12). 


Example 8.2.13. Let r € IR. We want to examine the convergence or divergence of 
the sequence {r"}°_,. There are six cases. 

First, suppose that 0 < r < 1. We will show that {r”}"_, is convergent, and that 
lim r” = 0. It follows from the hypothesis on r that 1 >1. Let g= i — 1. Then 


n—-oo 
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q>09 and 4 —~ = 1+. It now follows from Exercise 2.5.13 (1) that if n € N then 
i a (4)” =i +q)" > 1+nq > nq. Hence 0 <r" < 1! for all n EN. It follows from 


r 


Example 8.2.4 (1) that jim 0 = 0, and from Example 8. 2.4 (2) and Theorem 8.2.9 (3) 


that jim 7 Lhe ‘ =0. Besduse O0<r" ag i t for all n € N, then by the Squeeze Theorem 


for Sequences (Theorem 8.2.12) we ee that lim r” = 0. 
n—oo 
Second, suppose that r > 1. We will show that {r"}>"_, is divergent. By using 
Lemma 8.2.6, it will suffice to show that {r”}""_, is not bounded. Let M € R. If M <0, 
then r” > M for all n € N. Now suppose that M > 0. It follows from the hypothesis 
on r that 0 < < 1, and hence by the previous paragraph we know that lim (4)” =0. 
n—-oo 


Therefore there is some N € N such that n € N and n > N imply \(4)” —0| < 7. It 
follows that if n € N andn > N, then r” = |r"| > M. Hence {r"}*_, is not bounded 
above, and therefore it is not bounded. 

Third, suppose that r= 1 or r= 0. Then jim r= = lim 1=1lor im r’= lim0=0 


by Example 8.2.4 (1). 
Fourth, suppose that —1 <r <0. Then 0 < |r| < 1, and by the first case, we see 
that lim |r”| = lim |r|" = 0. It then follows from Exercise 8.2.13 (2) that lim r” = 0. 


Fifth, suppose that r= —1. Then {r"}°_, is divergent by Example 8.2.4 (4). 

Sixth, suppose that r << —1. We will show that {r’}*"_, is divergent, once again 
by showing that {r”}*"_, is not bounded. Let M € R. Suppose that M > 0; the case 
where M < 0 is similar, and we omit the details. It follows from the hypothesis on r 
that |r| > 1. Using the argument given in the second case, we know that there is some 
N EN such that n € N andn > N imply |r|" > M. It follows that ifn ¢ N andn > N 
and n is even, then r” = |r|" > M. Hence {r"}*"_, is not bounded. 

Putting all of the above cases together, we see that {r”}°"_, is convergent if and 
only if -l<r<l.If-l<r<l then lim 7” = 0, and if r= 1 then lim 7" = 1. O 


Similarly to the Type 2 limits to infinity for functions that were discussed in 
Section 6.2, it is also possible to define what it means for a sequence to diverge to 
infinity or negative infinity. 


Definition 8.2.14. Let {a,};_, be a sequence in R. The sequence {a,},_, diverges 
to infinity, written 
lim a, = ©, 


n—-oo 


if for each P € R, there is some N € N such that n € N andn > N imply a, > P. The 
sequence {a,},_, diverges to negative infinity, written 


lim a, = —°° 


n—oo 


? 


if for each Q € R, there is some N € N such thatn € Nandn>Nimplya,<Q. A 


Observe that we say that a sequence “diverges to infinity,” and not that it “con- 
verges to infinity,’ because convergence always means convergence to a real number, 
and there is no real number that is infinite. 
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Reflections 


Nowhere in an introductory real analysis course is the difference in emphasis 
between such a course and a calculus course more immediately apparent than in 
the role of sequences. In a calculus course, sequences per se receive very brief 
treatment, usually only as minimally needed for use as partial sums of series. In a real 
analysis course, by contrast, sequences have a very important role to play—even in the 
present text, where sequences are located in the same place as in a calculus course, as 
opposed to some real analysis texts, which locate sequences earlier. The reason for the 
difference in the role of sequences in a real analysis course and a calculus course is the 
importance of sequences as a tool for rigorous proofs, as seen in the applications of 
sequences found in Section 8.4, though this collection of applications is but a sampling 
of the wide use of sequences. Ultimately, the value of sequences is that by using 
theorems such as the Monotone Convergence Theorem and the Bolzano—Weierstrass 
Theorem, both found in Section 8.3 and both equivalent to the Least Upper Bound 
Property of the real numbers, we have additional—and sometimes simpler—ways of 
using that property. 

Having stressed the importance of sequences, the most immediate impression one 
gets upon first encounter with the introductory material about sequences in the present 
section is its similarity to the material concerning limits of functions in Section 3.2. 
In fact, both limits of sequences and limits of functions are special cases of a more 
general type of limit based upon the idea of directed sets; see [Bea97] for details. 
Hence, it is not a coincidence that some of the definitions, theorems and proofs 
concerning limits of sequences are so similar to their analogs for limits of functions. If 
the reader finds the material about limits of sequences easier than limits of functions, 
that is in no small measure because the reader has already gained experience with 
limits of functions, and hence, in contrast to the reader’s initial encounter with that 
material, which was likely her first encounter with €—6 arguments, by now the variant 
of such arguments seen in the present section are nothing new. Moreover, after some 
of the tricky proofs we saw in Chapter 5, the proofs in the present section are indeed 
easy by comparison. It is only when we get to the more substantial theorems about 
sequences in Section 8.3 that we start to see tricky proofs. 


Exercises 


Exercise 8.2.1. [Used in Example 9.2.4.] Use only the definition of limits of sequences 
for each of the following proofs. 
L\” ; 

(1) Prove that {4 Jt is convergent. 

(2) Prove that { a ted is convergent. 

(3) Prove that {n}*"_, is divergent. 
Exercise 8.2.2. [Used in Example 9.4.5, Example 9.5.7 and Example 10.5.2.] Let b,c € 
R. Suppose that b £ c. Let {a,};"_, be defined by 


b, ifniseven 


an = : ; 
c, ifnis odd. 
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Prove that {a,}/"_, is divergent. 


co 


Exercise 8.2.3. [Used in Section 8.2.] Let {a,};, and {b,}*°_, be sequences in R. 
Suppose that there is some N € N such that n € N andn > N imply a, = by. Prove that 
{an},,_1 is convergent if and only if {b, }>_, is convergent, and if they are convergent 
then lim a, = jim Dg: 


n—-eoo 


Exercise 8.2.4. [Used in Example 9.2.4, Theorem 9.2.5, Theorem 9.4.15 and Corol- 
lary 10.4.15.] Let {a,}>_, be a sequence in R. Let r EN, and let {b,}°_, be the 
sequence defined by b, = a,+, for all n € R. Prove that {an} 1 is convergent if and 


only if {b,};_, is convergent, and if they are convergent then lim a, = lim Dn. 
n—-oo n—-eoo 


Exercise 8.2.5. [Used in Exercise 8.2.11, Theorem 9.4.7, Theorem 9.4.15 and Lem- 
ma 10.4.13.] Let {a,};_, be a sequence in R, and let L € R. Using only the definition 
of limits of sequences, prove that {a,};_, is convergent and lim a, = L if and only if 


n—-oo 


{ay —L},_, is convergent and lim (a, —L) = 0. 
n—-coo 
Exercise 8.2.6. [Used in Section 8.2.] Find an example of sequences {a,};"_, and 
{bn},,_; in R such that {a,}°"_, is convergent and lim a, = 0, and that {abn }7-_, is 
n—-oo 

divergent. 

Exercise 8.2.7. Let {a,},_, be a sequence in R. Suppose that {a,};"_, is convergent. 
Prove that if lim a, > 0, then there is some M > 0 and some N € N such that n € N and 

n—0oo 
n> WN imply a, > M. (This exercise is the analog for sequences of the Sign-Preserving 


Property for Limits (Theorem 3.2.4).) 


Exercise 8.2.8. [Used in Theorem 9.4.15.] Let {a,},_, and {b,},_; be sequences in 
IR. Suppose that {a,}>, and {b,},_, are convergent. Prove that if lim a, = lim bn, 


n—-co n—-oo 
then {min{an,b,}};_, is convergent and lim min{a,,b,} = lim a, = lim bn. 
_ n—-oo n—-oo n—-0o 


Exercise 8.2.9. [Used in Theorem 8.4.1, Exercise 8.4.9, Exercise 8.4.14 and Exer- 
cise 8.4.15.] Let {a,};_, and {b,};_, be sequences in R, and let L € R. Suppose that 


{bn},,_1 is convergent and lim b, = 0, and that there is some N € N such that n © N 


n- oo 


and n > N imply |a, —L| < by. Prove that {a,}/_, is convergent and lim a, = L. 


n—-eoo 


Exercise 8.2.10. [Used in Example 8.4.3 and Example 9.3.7.] Let {a,}/"_; be a se- 
quence in R, and let f: [1,-0) — R be a function. Suppose that f(n) = a, for all 
neEN. 


(1) Prove that if lim f(x) exists, then {a,}*, is convergent and lima, = 


lim f(x). 
(2) This part of the exercise makes use of Exercise 6.2.15. Prove that if lim f(x) = 


X— co 


co, then lim a, = ©. 


n—-0o 


(3) Find an example of a sequence {a,},_, and a function f that satisfy the 
hypotheses of this exercise, and such that {a, }*_, is convergnet and lim f(x) 
X—0o 


does not exist. 
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Exercise 8.2.11. [Used in Theorem 8.2.9.] Prove Theorem 8.2.9 (1) (2) (3) (5). 
[Use Exercise 8.2.5.] 


Exercise 8.2.12. [Used in Example 9.2.4.] Let {a,};, and {b, };"_, be sequences in 
R, and let k € R. Suppose that {a,};"_, is divergent and {b,};"_, is convergent. 


(1) Prove that {a, +b,};_, is divergent. 

(2) Prove that {a, —b,};_, is divergent. 

(3) Prove that if k 4 0, then {ka,};"_, is divergent. 

(4) Find an example of sequences {c,};_, and {d,},_, in R such that {c,}7 
and {d,},,_, are divergent and {c, + dn}; is convergent. 


Exercise 8.2.13. [Used in Example 8.2.13, Theorem 9.4.15, Exercise 10.3.5, Corol- 
lary 10.4.15 and Theorem 10.5.2.] Let {a,},"_; be a sequence in R. 
(1) Prove that if {a,}*", is convergent, then {|a,|}>"_, is convergent and lim |a,|= 
n—-oo 
| lim a,|. 
n—-oo 
(2) Prove that {a,}>_, is convergent and lim a, = 0 if and only if {|a,|};"_, is 
n—-oo 
convergent and lim |a,| = 0. 
n—-eoo 


(3) Find an example of a sequence {b, }°_, in R such that {|b,|};_, is convergent 
and {b,};-_, is divergent. 


Exercise 8.2.14. [Used in Section 8.3.] Let r € R. Prove that there is a sequence 
{dn},,—1 in Q such that lim gp = r. 
n—-oo 


Exercise 8.2.15. [Used in Example 9.5.2 and Example 9.2.4.] Using only Defini- 
tion 8.2.14, prove that lim n = ~, 
n—-oo 


Exercise 8.2.16. [Used in Theorem 10.5.2.] Let r € (1,00). Prove that lim r” = cx. 
n—-0o 


Exercise 8.2.17. [Used in Example 9.2.4, Example 9.5.2 and Theorem 10.5.2.] Let 

{an}, and {b,};_, be sequences in R, and let k € R. Suppose that lim a, = ce and 
n—-eoo 

{bn},,_1 is convergent. 


(1) Prove that lim (a, +b,) =. 


(2) Prove that lim (a, +k) =. 
(3) Prove that if k > 0 then lim ka, =. 


Exercise 8.2.18. [Used in Example 9.2.4 and Theorem 10.5.2.] Let {a,};_, and 
{bn}, be sequences in R. Suppose that lim a, =o. 
n—-0oo 


(1) Suppose that there is some N € N such that n € N andn > N imply ay, < Dy. 
Prove that lim b, =. 
n—-oo 


(2) Suppose that there is some N € N such that n € N andn > N imply apy < |d,|. 
Prove that {b, };"_, is divergent. 
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The basic properties of sequences that we saw in Section 8.2 rely only upon the 
algebraic properties of the real numbers; that is, these properties would also hold for 
sequences in the set of rational numbers. We now turn to three important theorems 
about sequences, all of which rely upon the Least Upper Bound Property of the real 
numbers. 

As motivation for what we will see in this section, recall Theorem 5.4.7, which 
gave a characterization of integrability that does not require a guess as to the value of 
the integral. That theorem stated, essentially, that if all the Riemann sums of a function 
on a non-degenerate closed bounded interval became closer and closer to each other 
for partitions with smaller and smaller norms, then the function is integrable. Similarly, 
it would be nice to be able to prove that a sequence is convergent without having 
to guess the value of the limit of the sequence, because in some situations it is not 
feasible to make such a guess. In particular, we will now see the (much simpler) 
analog of Theorem 5.4.7 for sequences, which states intuitively that a sequence is 
convergent if and only if the terms of the sequence get closer and closer to each other 
as n gets larger and larger. This characterization of the convergence of sequences, 
called the Cauchy Completeness Theorem, will be given in Corollary 8.3.16 below. 
(There is an analogous result for limits of functions given in Exercise 3.2.18, but the 
version for sequences is more widely known, and much more widely used.) Before 
we will be ready to prove that result, however, we will need some other important 
concepts and theorems about sequences; these other theorems do not provide complete 
characterizations of the convergence of sequences, but they state some conditions that 
guarantee convergence, and are useful in their own right. 

The first of the three important theorems of this section, called the Monotone 
Convergence Theorem, needs the following definition, which is analogous to Defini- 
tion 4.5.1. 


Definition 8.3.1. Let {a,};_, be a sequence in R. 


1. The sequence {a,},_, is increasing if n < m implies ay < dm for alln,m EN. 

2. The sequence {a,},"_, is strictly increasing if n < m implies a, < dj, for all 
nymeN. 

3. The sequence {a,};"_, is decreasing if n < m implies ay > a» for alln,m EN. 

4. The sequence {a,}"_, is strictly decreasing if n < m implies dy > am for all 
nymeN. 

5. The sequence {a,},_, is monotone if it is either increasing or decreasing. 

6. The sequence {a,};"_, is strictly monotone if it is either strictly increasing 
or strictly decreasing. A 


Some books use the terms “non-decreasing” and “increasing” to mean what we 
call “increasing” and “strictly increasing,” respectively, and similarly for decreasing. 
As was the case with functions, there is no definitive terminology here. 


Example 8.3.2. 


(1) We will show that the sequence {2 _ is strictly increasing, and hence it is 
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strictly monotone. Let us denote this sequence by {a, }* =i on approach is as follows. 


Let n,m €N, and suppose that n < m. Then dm — dn = 5 5 - a CESNCEE > 0, 


and therefore a, < a. Hence {an}y_ , is strictly mncreASIng: Another approach is to 
consider function f: [1,00) + R defined by f(x) = “45 for all x € [1,00). Then 


f(xy = wae for all x € (1,00). Hence f’(x) > 0 for all x € (1,0), and it follows 


from nee 4.5.2 (2) that f is strictly increasing. By restricting f to the natural 
numbers, we see that {a,},_, is strictly increasing. 
(2) The sequence {(—1)"}°"_, is not monotone; we omit the details. ?) 


The following theorem, though not very difficult to prove, will immediately 
imply the Monotone Convergence Theorem (Corollary 8.3.4). For the proof of Theo- 
rem 8.3.3, it is important to keep in mind the difference between a sequence {an}, 1, 
and the set of all of its terms {a, | € N}. The set {a, | n € N} is not a sequence, 
and cannot have a limit, though it can have a least upper bound or a greatest lower 
bound. The idea of this theorem is that if a sequence is increasing, then it can do one 
of two things, namely, either increase without bound, in which case the sequence 
converges to infinity, or not increase without bound, in which case the sequence is 
bounded above, and it would then seem plausible that there is some number to which 
the sequence converges. 


Theorem 8.3.3. Let {an}; be a sequence in R. 


1. If {an}; is increasing and bounded above, then {ay}, is convergent and 
lim ay = lub {a, |n € N}. 
n—-oo 


2. If (a, _, is increasing and not bounded above, then lim ay = °°. 
n—co 


3. If {an},_1 is decreasing and bounded below, then {an}\_, is convergent and 
lim a, = glb {a, |n € N}. 
n—co 
4. If {an} 1 is decreasing and not bounded below, then lim a, = —~. 
n—oo 


Proof. We will prove Part (1), leaving Part (2) to the reader in Exercise 8.3.4; the 
other two parts are similar, and we omit the details. 


(1) Suppose that {a,};"_ is increasing and bounded above. Let A = {a, | n € 
N}. Then A is bounded above. Clearly A 4 0, and hence the Least Upper Bound 
Property implies that A has a least upper bound. Let € > 0. By Lemma 2.6.5 (1) there 
is some ay € A such that lubA — € < ay < lubA. Suppose that n € N andn > N. 
Because ean a is increasing, it follows that lubA — € < ay < dy < lubA, and hence 
|a, —lubA| < €. Therefore dim an = lubA. 


The following corollary is an immediate consequence of Theorem 8.3.3 and 
Lemma 8.2.6. 
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co 


Corollary 8.3.4 (Monotone Convergence Theorem). Let {a,},_, be a sequence 
in R. Suppose that {ay};_, is monotone. Then {an},,_, is convergent if and only if 
{an},,_1 is bounded. 


The Monotone Convergence Theorem (Corollary 8.3.4) is useful for proving other 
theorems, for example some results about the convergence of series in Section 9.3. 
Moreover, this theorem is nice in that it is essentially the sequence analog of the 
Least Upper Bound Property. We will see in Theorem 8.3.17 below that the Monotone 
Convergence Theorem is equivalent to the Least Upper Bound Property; in fact, some 
people use the former in the axioms for the real numbers instead of the latter. 

As nice as the Monotone Convergence Theorem is in principle, however, it is not 
very useful from the point of view of proving that specific sequences are convergent, 
because most sequences are not monotone. We now turn to another approach to 
finding convergent sequences. Consider the sequence {(—1)"}/"_,, which is certainly 
not monotone. We notice, however, that even though this sequence is not convergent, it 
“contains” two convergent sequences inside it, namely, the collection of all terms that 
have value 1, and the collection of all terms that have value —1. In order to state our 
next important theorem, the Bolzano—Weierstrass Theorem, given as Theorem 8.3.9 
below, we first need to discuss the general notion of a sequence contained in another 
sequence. 

Consider the sequence { i } 


i) 


aT which we can write out as 


ti 
gigigee (8.3.1) 


We can find many sequences that are “contained” in this sequence, for example the 
sequences 


1111 
1°3°5°7° 
1111 
2°4’8° 16’ (8.3.2) 


A sequence is not just a countably infinite set, but it is a countably infinite set listed in 
a specific order, and it is important to note that for one sequence to be “contained” in 
another the order of the terms must be preserved. For example, we do not consider 
the sequence 


col 


1 1 

per 

as being “contained” as a sequence in the sequence given in Equation 8.3.1, even 
though it is contained as a subset. In order to preserve the order of the original 
sequence, we need to look at the subscripts of the terms of the sequences under 
consideration. For example, if the sequence in Equation 8.3.1 is denoted {an}; 1, 
then we could denote the sequence given in Equation 8.3.2 by {ayn }/"_,; the reason 
that the order of the terms in the original sequence is preserved is that the sequence 
of subscripts {2"}"_, is strictly increasing. This idea is formalized in the following 
definition. 


Ale 
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Definition 8.3.5. Let {a,};"_, be a sequence in R. Suppose that {a,}°"_, is defined by 
a function f: N — R such that f(n) = a, for alln € N. Let g: N— N be a function. 
Suppose that g is strictly increasing. Then the sequence defined by the function 
fog: N— Risa subsequence of {a,};"_,. The sequence defined by f og is written 
as ities where g(k) = nx for all k EN. A 


Although Definition 8.3.5 is phrased in terms of the function g, in practice we 
will virtually never explicitly mention this function, and we will define subsequences 
by writing expressions of the form re Sane Observe that the function g in Defi- 
nition 8.3.5 is strictly increasing if and only if the sequence {m}j_, is a strictly 
increasing sequence in N. 


Example 8.3.6. 


(1) Let {a,};°_, be a sequence in R. Taking every other term of this sequence 
starting with the first yields the subsequence {a2,—1};_;. Formally, using Defini- 
tion 8.3.5, we use the function g: N — N defined by g(n) = 2n — 1 for alln € N to 
define this subsequence, though we do so only to show that it can be done, and from 
now on we will not write the function g explicitly. 

(2) Let {b,};_, be defined by b, = (—1)” for all n € N. We saw in Exam- 
ple 8.2.4 (4) that {b, }*°_, is divergent. The subsequence {b2,};"_, is the sequence 
that is constantly 1, and this subsequence is convergent by Example 8.2.4 (1). Hence, 
we see that a sequence can be divergent, but have a convergent subsequence. 

(3) The sequence {n}>"_, is divergent, and every subsequence of it is also divergent. 
Hence, not every sequence has a convergent subsequence. 

(4) Let {cy }>_, be defined by 


1, ifniseven 
Cn = : ; 
n, if nis odd. 


The sequence {ent A is divergent, and it is not bounded, but in contrast to the 
sequence in Part (3) of this example, the sequence {c,};_, has a convergent subse- 
quence, which is {c2n}7y. }) 


The following lemma says that if a sequence is convergent, then so is every 
subsequence. 
Lemma 8.3.7. Let {a,}>_, be a sequence in R, and let {an, i be a subsequence of 


{an}; _1-1f {an};_, is convergent, then {an 4 is convergent and jim any = dim ap. 


Proof. Suppose that {a,};"_, is convergent. Let € > 0. Let L = lim ay. Then there is 
n—co 


some N € N such that n € N andn > N imply |a, —L| < €. Suppose that k € N and 
k > N. Then by Exercise 8.3.5 we see that ny > k > N. Hence |ay, — L| < €. It follows 
that Hs an, = L. 


In Example 8.3.6 (2) we saw a divergent sequence that has a convergent subse- 
quence, and hence Lemma 8.3.7 cannot be made into an if and only if statement. 
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In Example 8.3.6 (4) we saw a divergent sequence that is not bounded and that has 
a convergent subsequence, and in Example 8.3.6 (3) we saw a divergent sequence that 
is not bounded and that does not have a convergent subsequence. In Example 8.3.6 (2) 
we saw a divergent sequence that is bounded and that has a convergent subsequence. 
The reader will have noticed that missing from Example 8.3.6 is a divergent sequence 
that is bounded and that does not have a convergent subsequence. It turns out, as we 
will see by the Bolzano—Weierstrass Theorem, which is the second of our important 
theorems and which is stated as Theorem 8.3.9 below, that no such example exists. 
We start with the following somewhat surprising lemma. Refer to Example 2.5.13 for 
the definition of a greatest element of a set. 


Lemma 8.3.8. Let {an};, be a sequence in R. Then {an};_, has a monotone 
subsequence. 


Proof. For each k € N, let T= {ay |n € N andn > k}. There are two cases. 

First, suppose that 7; has a greatest element for each k € N. We define a sequence 
{nx };_, in N using Definition by Recursion as follows. Let n; € N be such that ay, 
is the greatest element of 7;. (Although the set 7; has a unique greatest element 
by Exercise 2.5.17, there may be more than one i € N such that a; is the greatest 
element of 7), in which case we let n; be one such number; to avoid choosing, it 
would be possible to select the smallest possible n; by the Well-Ordering Principle 
(Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6).) Let nz € N be such that n2 >, +1 
and a, is the greatest element of T,,, +. Then nz > ny. Similarly, there is some n3 €¢ N 
such that n3 > nz and a, is the greatest element of T,,, 1. Continuing in this way, 
we define a sequence {nx };_, in N that is strictly increasing, and such that ap, is the 
greatest element of J, ,+1 for all k < N such that k > 2. 

We have therefore defined a subsequence metas of {an}, Let i,j € N. 
Suppose that i < j. Then j > 2, and hence ay, j is the greatest element of Tj 
There are now two subcases. First, suppose that i= 1. Then a, is the greatest element 
of T,. Because nj_; + 1 > 1, then T; 2 Thj_\+1- It follows from Exercise 2.5.18 (1) 
that ay, > An;- Second, suppose that i > 1. Then i > 2, and hence ay, is the greatest 
element of 7, ,+1. Because {n}e— is a strictly increasing subsequence of N, then 
i< j implies that nj; + 1 <nj—-; +1, and hence T,,,_,41 2 Thy i++ It follows from 
Exercise 2.5.18 (1) again that a», > dn,. By putting together the two subcases, we see 
that {any Veo is decreasing, and is therefore monotone. 

Second, suppose that there is some r € N such that 7, does not have a greatest 
element. We define a sequence {m,};_, in N using Definition by Recursion as follows. 
Let m, =r. Then dm, € Tn, = T-, and by hypothesis a,,, is not a greatest element 
of Ti, Therefore there is some m2 € N such that mz > m, + 1 and dm, > dm,. (As 
before, the choice of mz is not necessarily unique; to avoid choosing, it would be 
possible to select the smallest possible mz by the Well-Ordering Principle.) Hence 
mz > my. Then Tn, D Tin, and Tiny — Tiny, = {Gm, +Qmy+15+++>4m,—1}, which is a finite 
set. Using Exercise 2.5.18 (2) we see that 7,,, has no greatest element. Similarly, there 
is some m3 € N such that m3 > mz and dm, > dm). Continuing in this way, we define 
a sequence {m,};_, in N that is strictly increasing, and such that the subsequence 
ere Sale of {a,};_, is strictly increasing. Hence on ae is monotone. 
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We are now ready to prove the second of our important theorems, which is both 
interesting in its own right, and is a very useful tool in real analysis. 


Theorem 8.3.9 (Bolzano—Weierstrass Theorem). Let {a,}/_; be a sequence in R. 
If {an}; is bounded, then {ay}; has a convergent subsequence. 


Proof. Suppose that {a,}7_, is bounded. By Lemma 8.3.8 we know that {a,};_, 
has a monotone subsequence. Because {a,,},,_, is bounded then so is the monotone 
subsequence, and hence this subsequence is convergent by the Monotone Convergence 
Theorem (Corollary 8.3.4). 


Another proof of the Bolzano—Weierstrass Theorem (Theorem 8.3.9), which uses 
the Nested Interval Theorem (Theorem 8.4.7) instead of the Monotone Convergence 
Theorem (Corollary 8.3.4), is given in Exercise 8.4.9; the reader in encouraged to try 
that exercise after reading Section 8.4, because it is based upon a very nice idea. 

Whereas the Bolzano—Weierstrass Theorem is stated in terms of sequences, the 
essential idea of the theorem, which is that an infinite collection of points that are 
confined to an appropriate region must get arbitrarily close to some single point, holds 
in more general situations. This more general idea is related to the topological notion 
of “compactness”; see [Mun00, Section 28] for details. 

We are now ready for the characterization of the convergence of sequences 
promised at the beginning of this section. We start with a precise definition of the 
notion of the terms of a sequence getting closer and closer to each other. 


Definition 8.3.10. Let {a,}*, be a sequence in R. The sequence {a,};"_, is a 
Cauchy sequence if for each € > 0, there is some N € N such that n,m € N and 
n,m = N imply lan | <E. ras 


Example 8.3.11. 


(1) We will prove that {i} is a Cauchy sequence. Let € > 0. By Corol- 
lary 2.6.8 (2) there is some N € N such that y < €. Suppose that n,m € N and 
n,m > N. Without loss of generality, assume that m > n. Then |m—n| < m, and 
therefore 


co 


1 
=-<-<eé. 
nm m nN 


1 1 | lm—n| m1 
= < 
nom 


co 


(2) We will prove that {(—1)"}""_, is not a Cauchy sequence. Let € = 1. Let 
N €N be an integer. Choose n,m € N such that n,m > N, and that n is even and m is 
odd. Then 
I(-1)"—(-1)"| = [1 -(-1)| =2 >. 


We therefore see that for the given €, there is no “N” that works, and hence 
{(—1)"};_, is not a Cauchy sequence. o) 


What is the relation between a sequence being a Cauchy sequence and being 
convergent? One aspect of this relationship is quite simple. If a sequence is convergent, 
then its terms get closer and closer to a single number, which implies that the terms 
must also get closer and closer to each other. 
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Theorem 8.3.12. Let {a,}_, be a sequence in R. If {a,};_, is convergent, then 
{an},,_1 is a Cauchy sequence. 


Proof. Suppose that {a,}°"_, is convergent. Let L = lim a,. Let € > 0. There is some 


N €N such that n € N and n > N imply |a, — L| < §. Suppose that n,m € N and 
n,m > N. Then 


E E 
lan — Gm| = \Qyn —L +L —ay| < |ay —L| + |L—ay| < 5 + 5 =€. 


The proof of Theorem 8.3.12 does not use anything other than the definitions of 
convergence and Cauchy sequences, and basic algebraic properties of the real numbers. 
In particular, the Least Upper Bound Property was not used. Hence, this theorem 
would still be true if the real numbers were replaced with the rational numbers. On 
the other hand, the converse to Theorem 8.3.12 is definitely not true for the rational 
numbers. For example, let {t, }*"_; be a sequence of rational numbers that converges 
to V2; such a sequence exists by Exercise 8.2.14. Because a sequence of rational 
numbers is also a sequence of real numbers, and because {t,};"_, converges in R to 
V2, then it follows from Theorem 8.3.12 that this sequence is a Cauchy sequence 
when viewed as a sequence in R. It is then also a Cauchy sequence when viewed as a 
sequence in Q. However, because \/2 is not rational, as we know by Theorem 2.6.11, 
it follows that {t, };°_, is not convergent as a sequence in Q, even though it is a Cauchy 
sequence in Q. 

In contrast to the situation in Q, we will see in Theorem 8.3.15 below that any 
Cauchy sequence in R is convergent in R, a fact that, as expected, ultimately relies 
upon the Least Upper Bound Property. We break up the proof of Theorem 8.3.15 
into two lemmas, which we now state and prove. The proofs of these lemmas do not 
make use of the Least Upper Bound Property, but the proof of Theorem 8.3.15 will 
use the Bolzano—Weierstrass Theorem, which relies upon the Monotone Convergence 
Theorem, which in turn uses the Least Upper Bound Property in its proof. 


Lemma 8.3.13. Let {a,};, be a sequence in R. Suppose that {a,};_, is a Cauchy 
sequence. If {ay}; has a convergent subsequence oe ae then {an}, , is con- 
vergent and lim dy, = lim ay,. 

n—co k- 00 
Proof. Suppose that on ee is a convergent subsequence of {a,};,. Let L = 


pe an,. Let € > 0. Because fant 4 is a Cauchy sequence, there is some N € N such 


that n,m € N and n,m > N imply |ay —am| < 5. Because jim Gy, = L, there is some 
M € Nsuch that k € N and k > M imply |a, —L| < §. Let P= max{M,N}. Suppose 
that n € N andn > P. Thenn > N. By Exercise 8.3.5 we know that np > P, and hence 
np > M and np > N. Then 


E E 
lan —L| = lan — np + @np —L| < \an — Anp| + |dnp — L| < an) =€E. 


Lemma 8.3.14. Let {a,}>_, be a sequence in R. If {ay}; is a Cauchy sequence, 
then {an},,_, is bounded. 
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Proof. Left to the reader in Exercise 8.3.15. 


We are now ready for our main result about Cauchy sequences. 


Theorem 8.3.15. Let {a,};_, be a sequence in R. If {ay};_, is a Cauchy sequence, 
then {an},,_1 is convergent. 


Proof. Suppose that {a,};_, is a Cauchy sequence. By Lemma 8.3.14 we know 
that {a,}/°_, is bounded. By the Bolzano—Weierstrass Theorem (Theorem 8.3.9) we 
know that {a,};"_, has a convergent subsequence. By Lemma 8.3.13 we deduce that 
{an},,_1 is convergent. 


The following corollary, the third of our important theorems, is simply a combina- 
tion of Theorem 8.3.12 and Theorem 8.3.15. 


Corollary 8.3.16 (Cauchy Completeness Theorem). Let {a,};"_, be a sequence in 
R. Then {an}>_, is convergent if and only if {an};,_, is a Cauchy sequence. 


The question of whether or not Cauchy sequences are convergent is important 
not only for the real numbers, but also in the more general context of metric spaces, 
where Cauchy sequences are not always convergent, and where such a space is called 
“complete” if all Cauchy sequences are convergent; see [Mun00, Sections 20 and 43] 
for details. 

As mentioned at the beginning of this section, our three main theorems rely 
upon the Least Upper Bound Property. We will now see that two of these theorems, 
namely, the Monotone Convergence Theorem (Corollary 8.3.4) and the Bolzano-— 
Weierstrass Theorem (Theorem 8.3.9), are each equivalent to the Least Upper Bound 
Property. For a discussion of what we mean by “equivalent” in this context, see the 
discussion in Section 3.5 prior to Lemma 3.5.3. Somewhat surprisingly, the Cauchy 
Completeness Theorem (Corollary 8.3.16) does not imply the Least Upper Bound 
Property. In the proof that the Monotone Convergence Theorem and the Bolzano-— 
Weierstrass Theorem imply the Least Upper Bound Property, we will make use of 


the fact that lim Gyo = 0, and the proof of that fact, given in Example 8.2.13, 
made use of the fact that lim ; = 0. This last fact was proved in Example 8.2.4 (2), 


and that proof made use of Corollary 2.6.8 (2), which in turn was a consequence of 
the Archimedean Property (Theorem 2.6.7). If an ordered field does not satisfy the 
Archimedean Property, then by Exercise 8.3.16 we see that lim i does not converge 


to 0, which may seem strange, but it demonstrates the importance of the Archimedean 
Property, and that this property should not be taken for granted. The proof of the 
Archimedean Property that we saw in Section 2.6 made use of the Least Upper Bound 
Property, though, as seen in Exercise 8.3.18, the Archimedean Property can also be 
proved using the Monotone Convergence Theorem. Hence, when proving that the 
Monotone Convergence Theorem implies the Least Upper Bound Property, we can 
make use of the Archimedean Property and any of its consequences. 

However, and this is the surprising part, it is not the case that the Cauchy Com- 
pleteness Theorem implies the Archimedean Property, because there are examples 
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of ordered fields in which every Cauchy sequence is convergent, but for which the 
Archimedean Property does not hold; see [GO03, Chapter 1, Example 7] for such an 
example. Because the Least Upper Bound Property implies the Archimedean Property, 
we deduce that the Cauchy Completeness Theorem is not equivalent to the Least 
Upper Bound Property. (It is the case that the Cauchy Completeness Theorem together 
with the Archimedean Property are equivalent to the Least Upper Bound Property; see 
[Olm62, Appendix Section 5] for a proof.) However, we note that our reliance on the 
Least Upper Bound Property in the proof of the Cauchy Completeness Theorem was 
necessary, and was not simply a matter of convenience (the reader is urged to trace 
the proof back to the Least Upper Bound Property). There exist ordered fields that do 
not satisfy the Cauchy Completeness Theorem (for example the rational numbers), 
and hence any proof of the Cauchy Completeness Theorem for the real numbers must 
ultimately rely upon some aspect of the real numbers beyond the axiom for an ordered 
field, and the only axiom for the real numbers other than that of an ordered field is the 
Least Upper Bound Property. 


Theorem 8.3.17. The following are equivalent. 


a. The Least Upper Bound Property. 
b. The Monotone Convergence Theorem. 
c. The Bolzano—Weierstrass Theorem. 


Proof. We have already seen that the axioms of the real numbers, that is, the axiom for 
an ordered field together with the Least Upper Bound Property, imply the Monotone 
Convergence Theorem and the Bolzano—Weierstrass Theorem. We also know that 
the Monotone Convergence Theorem implies the Bolzano—Weierstrass Theorem, as 
can be seen by examining the proof of the latter. It can also be seen that the Bolzano— 
Weierstrass Theorem implies the Monotone Convergence Theorem, by combining the 
former with Exercise 8.3.8. Hence the Bolzano—Weierstrass Theorem is equivalent 
to the Monotone Convergence Theorem, and to complete the proof, it will suffice 
to show that the Monotone Convergence Theorem together with the axiom for an 
ordered field imply the Least Upper Bound Property. 

The proof is by contradiction. Suppose that F is an ordered field that satisfies 
the Monotone Convergence Theorem, but does not satisfy the Least Upper Bound 
Property. We note by Exercise 8.3.18 that F satisfies the Archimedean Property. 

Let a, b, A, P and Q be as in Lemma 3.5.3. By Parts (1) and (2) of that lemma we 
know that PUQ = [a,b], and PNQ = 90, anda < b, and AN [a,b] C P, anda € P, and 
bead. 

By Exercise 8.3.17 there is a family {[a,n]};_, of closed bounded intervals 
in [a,b] such that for each i € N the following three conditions hold: (1) a; € P and 
b; € Q; (2) [ais1,biz1] C [a;, bj]; and (3) bj — a; = or . By Condition (2) we know 
that a; < a;,, for all ic N, and therefore by Exercise 8.3.1 we see that {a,}*"_, is 
increasing. Hence {a,};_, is bounded below by a}. Because [ay,bn| C [a,b] for all 
n€N, then {a,};"_, is bounded above by b. Therefore {a,};"_, is bounded. We will 
show that {a,};, is divergent, which will contradict the fact that the Monotone 
Convergence Theorem holds for F. 
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Suppose that {a,}", is convergent. Let L = lim a,. Let n € N. For eachk € N 
n—-eoo 


such that k > n, we observe that a, < a, because {an}o 1 is increasing, and ay < b, by 
Lemma 3.5.3 (3), making use of the fact that a; € P and b, € Q. Therefore ag € [an, bn] 
for all k € N such that k > n. Because the convergence of a sequence is not changed 
by dropping finitely many terms of the sequence, we use Theorem 8.2.11 to deduce 
that L € [a,,b,]. It follows that L € 4 [an, Dn]. 

Because [dn, bn] C [a,b] for all n € N, it follows that L € [a,b]. Hence L € P or 
L€ Q. First, suppose that L € Q. Then L is an upper bound of A by the definition 
of Q. Let € > 0. Using Example 8.2.13, which holds for F because F satisfies the 


Archimedean Property, together with Theorem 8.2.9 (3), we deduce that lim (b, — 


n—-oo 


an) = lim (b—a) Gy = 0. It follows that there is some N € N such that n € N 


and n> N imply |(b; — an) —0| < €. In particular by — ay < €. Because L € [ay , by], 
it follows that L — ay < €. Because ay € P, then by Lemma 3.5.3 (4) there is some 
x € A such that ay < x. Because L is an upper bound of A, then ay <x < L. It 
follows that L — x < €. We now use Exercise 2.6.6 to deduce that L = lubA, which is 
a contradiction, because A has no least upper bound. Second, suppose that L € P. By 
Lemma 3.5.3 (4) there is some y € P such that L < y. Let) = y—L. Then 7 > 0. Using 
the same argument as before, there is some K € N such that bx — ax < 1. Because 
L € [ax, bx}, it follows that bx —L < 7. Therefore by —L < y—L, and hence bx < y. 
However, we know y € P and bx € Q, which is a contradiction to Lemma 3.5.3 (3). 
We conclude that {a,};°_, is divergent, which is the desired contradiction to the 
Monotone Convergence Theorem. 


Reflections 


The three important theorems in this section are somewhat analogous, in both 
their content and uses, to the two important theorems in Section 3.5. Similarly to 
those two theorems, the Extreme Value Theorem and the Intermediate Value Theorem, 
the three theorems of the present section, the Monotone Convergence Theorem, the 
Bolzano—Weierstrass Theorem and the Cauchy Completeness Theorem, are also 
existence theorems, because they each state that a certain sequence (which could 
be a subsequence of the original sequence) is convergent, and that is the same as 
saying that there exists a real number to which the sequence converges. Just as the two 
theorems of Section 3.5 rely upon the Least Upper Bound Property of the real numbers 
(and in fact are equivalent to it), so too do the three theorems of the present sections 
rely upon the Least Upper Bound Property (two of the theorems are equivalent to the 
Least Upper Bound Property, and one necessarily relies upon it). 

Moreover, just as the two theorems of Section 3.5 are used to prove other useful 
theorems in real analysis (for example the Extreme Value Theorem is used to prove 
Rolle’s Theorem, which in turn is used to prove the Mean Value Theorem), the three 
theorems of the present section can also be used to prove substantial theorems in real 
analysis. Indeed, some texts in real analysis discuss sequences prior to the treatment 
of derivatives and integrals, and use these three theorems (and other theorems about 
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sequences) in proofs that we do without sequences. For example, in our proof of 
Theorem 5.4.7 (which is the main technical tool we use for proving various properties 
of integrals), we make use of the Least Upper Bound Property via the No Gap Lemma 
(Lemma 2.6.6), whereas in some texts this theorem is proved using one of the theorems 
about sequences that rely upon the Least Upper Bound Property. 


Exercises 


Exercise 8.3.1. [Used in Theorem 8.3.17, Theorem 8.4.7 and Theorem 9.4.15.] Let 
{an},_, be a sequence in R. Suppose that a, < dy+1 for all n € N. Prove that {a,}7_, 


is increasing. This result might seem obvious, but a proof is needed. 


Exercise 8.3.2. Using only the definition of strictly increasing, prove that { a 4 


is strictly increasing. 


Exercise 8.3.3. [Used in Theorem 9.3.6.] Let f: [1,0c) — R be a function. Suppose 
that f is increasing. Prove that lim f(x) exists if and only if the sequence { f(n)}""_| 
X— 00 


is convergent. 
Exercise 8.3.4. [Used in Theorem 8.3.3.] Prove Theorem 8.3.3 (2). 


Exercise 8.3.5. [Used in Lemma 8.3.7 and Lemma 8.3.13.] Let {a,};"_, be a sequence 
in R, and let { Gig tf be a subsequence of {an}y_1- Prove that n, >k for allk EN. 
Equivalently, let g: N > N be a strictly increasing function. Prove that g(k) > k for 
allk EN. 


Exercise 8.3.6. [Used in Exercise 9.2.7 and Theorem 9.3.8.] Let {a,},"_, be a sequence 

in R. Prove that {a,};"_, is convergent if and only if {a2,};-, and {a2,-1};, are 

both convergent and lim a2, = lim a2,_;, and that if these conditions hold then 
n—oo n—-co 


lim ay = lim adn = lim a2n-1- 
n—-oo 


n—-eo n—-0o 

Exercise 8.3.7. [Used in Exercise 9.4.6.] Let {a,};"_, be a sequence in R. Suppose that 
1 Bie fea is a subsequence of {a,};_, such that if k € N, then dy, = dny+1 = dnj+2 
‘++ =n, ,,—-1- Prove that {an} 1 is convergent, divergent or diverges to infinity if 
and only if fan, ie is convergent, divergent or diverges to infinity, respectively. 


co 


Exercise 8.3.8. [Used in Theorem 8.3.17 and Exercise 9.3.1.] Let {a,},"_, be a se- 
quence in R, and let tel + be a subsequence of {a,};"_,. Suppose that {a,}°"_, is 
monotone. Prove that if {an a is convergent, then {ant 4 is convergent. 


Exercise 8.3.9. Let {a,},_; be a sequence in R. 


co 


(1) Suppose that {a,}>, is bounded and divergent. Prove that {a,}_, 
least two convergent subsequences that converge to different numbers. 

(2) Suppose that {a,}""_, is bounded. Suppose that all convergent subsequences 
of {an}; have the same limit. Prove that {a,};, is convergent. 

(3) Find an example of a sequence {b,}°"_, in R, and a number M € R, such that 
if {bn, ty is a convergent subsequence of {b, };_, then lim bn, = M, but 


co 


has at 


Nk 
that {b,},,_, is divergent. 
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Exercise 8.3.10. Let {a,};"_, be a sequence in R, and let {an | ae be a subsequence 
of {a,},_,- Prove that if lim a, =o», then lim ay, = ©. 
a n—0oo k—s00 


Exercise 8.3.11. Using only the definition of Cauchy sequences, prove that { Ti } 
n= 
is a Cauchy sequence. 


co 


Exercise 8.3.12. Using only the definition of Cauchy sequences, prove that { ane } 
is a Cauchy sequence. 


n=1 


Exercise 8.3.13. Let {a,};"_; be a Cauchy sequence in R. Using only the definition 
of Cauchy sequences, prove that {|a,,|}/"_, is a Cauchy sequence. 


Exercise 8.3.14. Let {a,}°"_, and {b,};"_, be Cauchy sequences in R. Letk € R. 


(1) Using only the definition of Cauchy sequences, prove that {a + bn}?_, isa 
Cauchy sequence. 

(2) Using only the definition of Cauchy sequences, prove that {ka,}”_, is a 
Cauchy sequence. 

(3) Using only the definition of Cauchy sequences and Lemma 8.3.14, prove that 
{anby}_, is a Cauchy sequence. 


Exercise 8.3.15. [Used in Lemma 8.3.14.] Prove Lemma 8.3.14. 


Exercise 8.3.16. [Used in Section 8.3.] Let F be an ordered field. Suppose that F' does 
not satisfy the Archimedean Property. Prove that lim } does not converge to 0. 
n—-eoo 


Exercise 8.3.17. [Used in Theorem 8.3.17.] Let F be an ordered field, and let [a,b] C F 
be a non-degenerate closed bounded interval. Suppose that there are subsets P,Q C 
(a, b] such that a € P and b € Q, and that PUQ = [a,b] and POQ = 9. Prove that there 
is a family {[an,bn]},,_, of closed bounded intervals in [a,b] such that for each i ¢ N 
the following three conditions hold: (1) a; € P and b; € Q; (2) [ai41,bi41] © [ai, bil; 


and (3) bj — a; = 54. 


Exercise 8.3.18. [Used in Section 8.3 and Theorem 8.3.17.] The proof of the Archi- 
medean Property (Theorem 2.6.7) made use of the Least Upper Bound Property. 
Modify the proof of the Archimedean Property so that it relies upon the Monotone 
Convergence Theorem (Corollary 8.3.4) instead of the Least Upper Bound Property. 


8.4 Applications of Sequences 


Sequences have many uses in real analysis, and throughout mathematics. In particular, 
there are a number of connections between sequences and some of the topics we have 
seen in previous chapters, and in this section we will see a few such applications of 
sequences. The material in this section might seem haphazard, and indeed it is; the 
purpose of this section is to show the wide range of uses of sequences, not to build 
toward any one specific result. 

We start with the relation of the limits of sequences to the limits of functions. 
Most of the theorems and proofs about limits of sequences in Section 8.2 are very 
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similar to the analogous theorems and proofs about limits of functions in Section 3.2. 
In fact, limits of sequences and limits of functions are not just analogous, but they are 
concretely related to each other, as seen in the following theorem. 


Theorem 8.4.1 (Sequential Characterization of Limits). Let J C R be an open 
interval, letc € I, let f: I—{c} > R be a function and let L € R. Then lim f (x) = Lif 
x—-C 


and only if lim f (cn) = L for every sequence {cn}; in I — {c} such that lim cy =c. 
n—0o n—oo 


Proof. Suppose that lim f(x) = L. Let {c,}”_, be a sequence in J — {c} such that 
xc 
lim cy, =c. Let € > 0. Because lim f(x) = L, there is some 6 > 0 such that x € J— {c} 


and |x—c| < 6 imply | f(x) = i < €. Because lim c, = c, there is some N € N such 
n—-eoo 
that n € N and n >N imply |cy —c| < 6. Suppose that n € N and n > N. Then 
|cn —c| < 6, and we know by definition that c, € J— {c}. Hence |f(c,) —L| <. It 
follows that lim f(c,) = L. 
n—-eoo 
Next, suppose that lim f(c,) = L for every sequence {c,}"°_, inJ—{c} such that 
n—0o 
lim c, = c. Suppose also that lim f(x) 4 L. Then there is some € > 0 such that for 
n—-oo P Sa 
all 6 > 0, it is not the case that x € J— {c} and |x—c| < 6 imply |f(x) —L] < e. Let 
n €N. Then it is not the case that x € J— {c} and |x—c| < 7 imply |f(x) -L| <e. 
Hence, there is some x,, € J — {c} such that |x, —c| < 4 and |f(xn) —L| > €. We have 


therefore defined a sequence {x,}*°_, in J — {c}. It follows from Exercise 8.2.9 that 
lim x, =c, where we use the sequence {bn} defined by b, = i for all n € N, and 


n—oo 


N = 1. However, the fact that | f(x,) —L| > € for all € N means that lim f(x,) 4 L, 
n—-oo 


which is a contradiction. Hence lim f(x) = L. 


Pr fae 


It is important to observe in the statement of Theorem 8.4.1 that in order to know 
that the limit of a function exists, it is necessary to know something about the limits 
of all appropriate sequences; just knowing that the condition holds for one such 
sequence is not sufficient to imply that the limit of the function exists, as is seen in 
Exercise 8.4.1. 

Just as sequences can be used to detect the existence of limits of functions, 
sequences can similarly be used to detect continuity. 


Corollary 8.4.2 (Sequential Characterization of Continuity). Let 1 C R be an 
open interval, and let f: I — R be a function. Then f is continuous if and only if 


lim f(cn) = f (lim cn) 


n—-eoo n—-oo 


for every sequence {cy}, in I such that {cn}, is convergent and lim Cy € I. 
n—00 


Proof. Left to the reader in Exercise 8.4.2. 


We now use the Sequential Characterization of Continuity (Corollary 8.4.2), 
together with |’ Hopital’s Rule for 3 (Theorem 6.3.5), to provide justification for a 
well-known formula for the value of the number e. 
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Example 8.4.3. Recall our definition of the number e in Definition 7.2.9. That is 
certainly a correct definition, but it looks rather different from another common 
definition of e that is often used in high school algebra, which is 


1 n 
e= lim (1 + ~) : 
n—+00 n 


We now show that this limit exists, and that it equals the number e as we defined it in 
Definition 7.2.9. 

Let r € R. Then, by the one-sided analog of I’ Hépital’s Rule for 8 (Theorem 6.3.5) 
we see that 


i In(1 + rx) asi race _ 
x—0t Xx x—0+ 1 
Next, by Exercise 6.3.9 (1), we deduce that 
t 1 In(1 
lim In (1+ “) =limtin (1+7) = lim —In (+4) = jig OE, 
t—oo t t—oo t x—0F X : x—0t x 


It now follows from Exercise 8.2.10 (1) that 


lim In (1 re “\" =r 
n—o0o n 
By Theorem 7.2.7 (2) and Theorem 4.2.4 we know that e* is a continuous function. 
We can now use Theorem 7.2.14 (4) together with the Sequential Characterization of 
Continuity (Corollary 8.4.2) to deduce that 


n -\t hi n P 
jin (1+ “) = fim +8)" =e eit) = or, 
n—-oo n n—0co 
The above equation gives a nice characterization of the exponential function. In 
particular, if we choose r = 1, and use Theorem 7.2.12 (1), we see that 


] n 
e=el'=lim (1+) 6) 
n—-co n 


Our next application of sequences is to integration. Recall that in the definition of 
integrals, given in Definition 5.2.4, it was necessary to consider all possible partitions 
and all possible representative sets of these partitions to verify that a function is 
integrable. In practice, doing so can be quite cumbersome. Is it possible to consider 
only some of the partitions and representative sets? The following theorem shows 
that if we already know in principle that a function is integrable, for example if it 
is continuous using Theorem 5.4.11, then the value of the integral can be computed 
as the limit of a single sequence of Riemann sums, using an appropriately chosen 
sequence of partitions and representative sets. 


Theorem 8.4.4. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] — R be a function. Suppose that f is integrable. Let {P,};_, be a sequence 
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of partitions of |a,b| such that lim ||P,|| =0. For eachn €N, let T,, be a representative 
n—0o 
set of P,. Then 


b 
J Fle)dx= lim S(F,Pa, Tr). 


Proof. Let € > 0. Because f is integrable, there is some 6 > 0 such that if Q is a 
partition of [a,b] with ||Q|| < 6, and if S is a representative set of Q, then 


<€E. 


b 
sir.0.5) — [flax 
a 
Because lim ||P,|| =0, there is some N € N such that n € N and n > N imply |||P,|| — 
no 
0| < 6, which is equivalent to ||P,|| <6. Suppose that m € N and m > N. Then 


\|Pn|| < 6, and hence 


<€E. 


b 
StF. Past) — f f (x) dx 


It follows that 


b 
lim S(f,Pa.Tn) = f fled. 


It is important to note that if we do not already know that a given function is 
integrable, then looking at only a single sequence of partitions and representative sets 
as in Theorem 8.4.4 does not suffice to prove integrability. For example, in Exam- 
ple 5.2.6 (3), a sequence of Riemann sums with representative sets that have rational 
numbers will have one limit, but a sequence of Riemann sums with representative sets 
that have irrational numbers will have a different limit. 

The following example illustrates the use of Theorem 8.4.4 in computing the 
value of an integral. 


Example 8.4.5. We saw in Theorem 5.4.11 that all continuous functions on non- 
degenerate closed bounded intervals are integrable. Hence we know that ie x dx 
exists. We can then use Theorem 8.4.4 to compute the value of this integral as follows. 

Let f: [0,2] — R be defined by f(x) =x? for all x € [0,2]. Letn €N. Let the parti- 
tion P, of [0,2], and the representative set - of P,, be defined as in Example 5.2.3 (1). 
It was seen in that example that ||P,|| = 2 and S(f,Py,Tn) = seas . Hence by 
Example 8.2.4 (2) and Theorem 8.2.9 (3) we see that tim, \|Pal| = 0. Theorem 8.4.4 


and Theorem 8.2.9 then imply that 


2 
| x? dx = lim S(f,P,, TJ) 
0 n—-eoo 


Aa at 
tim Ct YCnt+) _ (+ )2t+n) _ 8 


n—oo 3n2 n—0o 3 : 


As seen by Example 5.6.6 (1), which uses the Fundamental Theorem of Calculus 
Version II (Theorem 5.6.4), our use of Theorem 8.4.4 yields the correct value of the 
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integral. Our point here, however, is not simply to obtain the correct value of the 
integral, but rather to show that in some cases it is possible to use limits of Riemann 
sums to compute an integral directly, without making use of the Fundamental Theorem 
of Calculus, which is nice to see because the definition of integrals is not about 
antiderivatives per se. Of course, the method we have used here to compute an integral 
is very cumbersome, and would not be feasible if we did not happen to have a very 
convenient summation formula to use. Ultimately, we could not compute the value of 
many integrals without the Fundamental Theorem of Calculus. 0) 


We now turn to a different use of Theorem 8.4.4, which is to give an intuitive 
explanation for the formula for the average value of a function. 


Example 8.4.6. We learn at an early age how to compute the average value of a finite 
collection of numbers, which is by adding the numbers and then dividing by how 
many numbers there are. It is not as obvious how to find the average value of an 
infinite collection of numbers. In particular, let [a,b] C R be a non-degenerate closed 
bounded interval, and let f: [a,b] + R be a function. We want to find the average 
value of the function f, which means that we want to find something that would be 
called the average value of the numbers in the set f([a,b]), which is an infinite set. 

One way to proceed would be to take finite samples of the elements of f([a,5]), 
compute the average value of each sample, and then take the limit of these average 
values as the sizes of the samples go to infinity, if such a limit exists. More specifically, 
let n € N. We then select a collection of n elements of [a,b], which we denote 
T, = {t],t%,...,t"}, where the superscript n indicates that the numbers are in T,,. The 
average value of the set {f(t/), f(t4),..-,f(t7)} is 


FO) + LG) ++ Ft) 


Fe * 


We then wish to look at 


tim LEMS +o +L) 


moo m 


in the hope that this limit will exist. Even if the limit exists, for it to be meaningful 
we would need to know that all such limits are equal no matter what numbers were 
selected for each 7,,. Unfortunately, if we do not place some restrictions on the choice 
of the numbers ¢7,t4,...,7/7, then they might all be clustered in one part of the interval 
[a,b], so that f(t}), f(4),...,f(t7) might not accurately represent the function. To 
remedy this problem, we need to ensure that the numbers f/',t7,...,4/’ are reasonably 
evenly spaced, though not necessarily exactly evenly spaced. One way to ensure that 
the numbers 1/',13,...,1” are roughly spread out is to let P, = {xj,x7.%5,..., x1} be 
the partition of [a,b] with n intervals of length pa and to insist that ¢? € ee | 
for alli € {1,...,n}. In other words, we suppose that T,, is a representative set of P,. 
Then 


FCDA) + +f) 1B a mb-a_ 1 
neo x 


n 
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1 
=> Pi, Tn): 
bua n) 


Now suppose that f is integrable. It is evident that lim ||P,,|| = 0, and it then follows 
m—-oo 
from Theorem 8.2.9 (3) and Theorem 8.4.4 that 


FOAL) + +f Gn) 


1 
= lim —— S(f, Pn, Tin) 


ij 
sates) m m0 b—@q 
= : lim S(f, Pn. 7; —- : d. 
gi RP (f, Pm; =] f(x) dx. 


Hence, when / is integrable, we deduce that 


fim lth) + F(R) + + Fo) 


m—co m 


exists and equals — Hs f (x) dx, for any choice of sets 7, subject to the restriction 
discussed above. It then makes sense to define the average value of an integrable 
function to be + f, > ¢ (x) dx. % 


For our next two applications of sequences, we will need the following theorem, 
which is about certain “sequences” of closed bounded intervals in R, by which we 
mean families of closed bounded intervals indexed by the natural numbers. This 
theorem, known as the Nested Interval Theorem, can be thought of as the sequence 
version of the No Gap Lemma (Lemma 2.6.6), and indeed our proof of the Nested 
Interval Theorem will make use of the No Gap Lemma. More accurately, one should 
think of the No Gap Lemma as the non-sequence version of the Nested Interval 
Theorem, given that the latter is well-known and widely used and the former is 
neither. A more common proof of the Nested Interval Theorem uses the Monotone 
Convergence Theorem (Corollary 8.3.4), or, more precisely, Theorem 8.3.3, which is 
the substance of the Monotone Convergence Theorem; that proof is left to the reader 
in Exercise 8.4.8. 


Theorem 8.4.7 (Nested Interval Theorem). Let {[an,bn|};_, be a family of closed 
bounded intervals in R. Suppose that |aj+1,bi+1] C [aj,b;] for alli € N. 


1. (\r=1 (an, bn] 4 9. 
2. If lim (by — an) = 0, then ("Vp [an, On] = {c}, where c = lim ay = lim by. 
no noo n—-0o 


Proof. We prove both parts of the theorem together. By hypothesis we know that 
qj < aj41 and bj1; <b; for all i € N. It then follows from Exercise 8.3.1, and the 
analog of that exercise for decreasing sequences, that {a,},, is increasing and 
{bn}, is decreasing. 

Let A = {a, | n € N}, and let B = {b, |n € N}. Let i,j € N. If i= j, then clearly 
qj < bj. Wfi < j, then a; < aj <b; < bj, and hence a; < b;. fi > j, a similar argument 
shows that a; < b;. Therefore A and B satisfy the hypothesis of the No Gap Lemma 
(Lemma 2.6.6). From Part (1) of that lemma, we know that A has a least upper 
bound and B has a greatest lower bound, and lubA < glbB. Let a = lubA, and let 
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b= glbB. Ifn EN, then a, <a <b < by, and hence [a,b] C [a,,by]. It follows that 
[a,b] C (1 [an, bn]. Because [a,b] 4 0, then -_; [an, bn] 4 9, which proves Part (1) 
of the theorem. 

Next, let x € R — [a,b]. First, suppose that x < a. Let € = a—x. Then € > 0. 
By Lemma 2.6.5 (1) there is some a, € A such that a—€ < ag <a, which im- 
plies that a— (a—x) < ag <a, and hence x < ay < a. It follows that x ¢ [ag, bx], 
and hence x ¢ (\7_; [an, bn]. Second, suppose that x > b. A similar argument shows 
that x ¢ (\7_1 [an, bn], and we omit the details. By contrapositive, we deduce that 
(V1 [ans Dn] C [a,b]. It follows that NP; [an, bn] = [a, 5]. 

Now suppose that tim, (by — an) = 0. Let € > 0. Then there is some N € N 
such that n € N and n> N imply |(b, — ay) — 0| < €. In particular by — ay < €. 
It follows from Part (2) of the No Gap Lemma that a = lubA = glbB = b. Hence 
(V1 (an, On| = [a,b] = {a}. Moreover, suppose n € N andn > N. Because a € [an, Dn], 
then ja, —a| < |by —ay| < € and |b, —a| < |by —ay| < €. It follows that a = tim dn 


and a = lim b,. We have therefore proved Part (2) of the theorem. 


We note that the Nested Interval Theorem does not hold for the set of rational 
numbers, for example because of the existence of a family of intervals {[an,Dn|}; 
where {a,}"_, and {b,}*_, are sequences of rational numbers that converge to V2, 
the former from below and the latter from above. 

Our next application of sequences is a nice proof that the set of real numbers is 
uncountable. The reader is likely to be familiar with the famous proof of the uncount- 
ability of IR known as “Cantor’s diagonal argument”; see [Blol10, Theorem 6.7.3] 
for this proof. The proof is very clever, and is important historically, though it does 
have a problem, which is that it makes use of the fact that every real number can be 
represented in decimal notation, and that such representation is unique if decimal 
representations that eventually become the number 9 repeating are not allowed. The 
existence of such decimal representation of real numbers can be proved, as we did 
in Section 2.8, but the proof is not trivial, and therefore a complete treatment of 
Cantor’s diagonal argument is much more involved than it might at first appear. More- 
over, whereas the decimal representation of real numbers is extremely useful from 
a computational perspective, it is not conceptually at the heart of the real numbers, 
and it would be nice to have a proof of the uncountability of R that is more directly 
related to the fundamental properties of the real numbers. The following proof of 
the uncountability of IR, which is also due to Georg Cantor (1845-1918) and in fact 
precedes his more famous diagonal argument, makes use of the Nested Interval Theo- 
rem (Theorem 8.4.7), but not decimals. This proof is a special case of a more general 
theorem in topology, as in [Mun00, pp. 176-177]. For this proof, we assume that the 
reader is familiar with basic properties of countable sets. In particular, we use the fact 
that a set A is countable if and only if there is a surjective function f: N — A; see 
[Blo10, Sections 6.5—6.6] for a proof of this fact, and for general information about 
countable and uncountable sets. 


Theorem 8.4.8. The set R is uncountable. 


Proof. Suppose that R is countable. Then there is a surjective function f: N — R. 
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By Exercise 2.3.10 there is a non-degenerate open bounded interval (a;,b;) CR 
such that f(1) ¢ [a1, 1]. Similarly, there is a non-degenerate open bounded interval 
(a2,b2) C (a1, b1) such that f(2) € [a2,b2]. Continuing in this way, we use Definition 
by Recursion to define a family {[an,bn,|};-_, of non-degenerate closed bounded 
intervals in R such that f(n) ¢ [an, bn] and (dn41,bn+1) © (dn, bn) for alln € N. Hence 
(n+1;0n+1] © [an, by] for all n € N. It then follows from the Nested Interval Theorem 
(Theorem 8.4.7) that V7, [an,bn] 4 9. Let y © (YP, [an, bn]. Because f(n) € [an,bn 
for alln EN, it follows that y 4 f(n) for all n € N. Hence f is not surjective, which 
is a contradiction. We deduce that R is uncountable. 


As another application of sequences, we now provide the details that were missing 
from Example 5.8.2 (5), which was part of our discussion of sets of measure zero. 


Example 8.4.9. As part of our discussion of sets of measure zero in Section 5.8, it 
was stated without proof in Example 5.8.2 (5) that there are uncountable subsets of R 
that have measure zero. We now show a famous example of such a set, which is called 
the Cantor set. To define the Cantor set, we first use Definition by Recursion to define 
a family {C,};_, of subsets of the interval [0,1]. Let C, be defined by removing the 
“open middle third” from [0, 1]; that is, let 


C= [0,3] U [3,1], 


Let C2 be defined by removing the open middle third from each of the two disjoint 
intervals in C); that is, let 


Let C3 be defined by removing the open middle third from each of the four disjoint 
intervals in C); that is, let 


C3 = [0,97] U[a4-5] U [5,29] Ula 3] U (3, a] U [3p 5] U LS a) U [99 HI. 
Continuing this way, we define a family of sets {C,};"_, such that C, that is the 
union of 2” closed bounded intervals of length qi and C,41 C Cy, for all n € N. See 
Figure 8.4.1 for an illustration of Ci, Cz and C3. 

The Cantor set is the set C defined by C = ()-_, C,. The set C is not empty, 
because for each n € N, the 2” intervals in C,, have 2”+! endpoints, and each of these 
endpoints is also in C; for all k € N such that k > n, and so these endpoints are all in 
Cc; 


We now show that C has measure zero, as defined in Definition 5.8.1. Let € > 0. 
By Example 8.2.13 we know that lim (3)” = 0. Then there is some N € N such that 


n€Nandn>N imply |(3)" -9| < §. Hence 2”. aw < €. The set Cy is the union of 


2" closed intervals of the form [st «| , where the values of a are 2" of the elements 


of {1,2,...,3%}; we do not need the precise list of possible values of a, but only the 


number of such values. Each closed interval of the form Ee «| is contained in the 
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Fig. 8.4.1. 


open interval (s, a) , which has length x. Hence Cy is contained in the union 


of 2" open intervals of length x. Because C is a subset of Cy, it follows that C is 
contained in the union of 2% open intervals of length x. The sum of the lengths of 


these 2" open intervals is 2’ - x < €. It follows that C has measure zero. 

There are two standard approaches to proving that C is uncountable. One approach, 
which is left to the reader in Exercise 8.4.10, uses the Nested Interval Theorem 
(Theorem 8.4.7), and is similar to the proof of Theorem 8.4.8. Another approach, 
which we will not discuss, makes use of the base 3 representation of real numbers, 
often called the “ternary expansion” of real numbers; see [TBBO1, Section 6.5.2] or 
[Sto01, Section 3.3] for details. © 


We conclude this section with an application of the Cauchy Completeness Theo- 
rem (Corollary 8.3.16) to the Fibonacci numbers. 


Example 8.4.10. The Fibonacci numbers are the terms of the widely studied sequence 
that starts 
1,1,2,3,5,8,13,21,34,55,89,144.... 


Formally, the Fibonacci numbers are the terms of the unique sequence {F,};"_, that 
results from using Definition by Recursion with the conditions F; = 1 and Fy = 1, 
and Fy42 = F, + F,+1 for all n € N. That such a formula defines a unique sequence 
follows from Exercise 2.5.20. 

The Fibonacci numbers have a long history, and are well studied by mathemati- 
cians. These numbers have many interesting mathematical properties, as seen in 
[Knu73, Section 1.2.8 and exercises], [GKP94, Section 6.6] and [HHP97, Chapter 3]. 
The Fibonacci numbers are also viewed by some authors as having something of a 
spiritual significance, the validity of which the author of this text is not qualified to 
judge; the interested reader might wish to consult [Gar87] or [Hun70]. 

Our interest in the Fibonacci numbers concerns convergence. Clearly the Fibonacci 
numbers do not converge as a sequence, because the sequence of Fibonacci numbers 
is not bounded, and Lemma 8.2.6 states that any convergent sequence is bounded. 
However, it is possible to construct an interesting convergent sequence out of the 
Fibonacci numbers by considering the sequence of ratios of the Fibonacci numbers 
with their predecessors, which is 
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If the reader expresses these fractions as decimals, it will be seen that the ratios appear 
to be getting closer and closer to approximately 1.618..., with the ratios alternately 
above and below this number. 

The number 1.618... might look familiar to the reader; it is the famous “golden 
ratio,’ which is often denoted @. It would take us too far afield to discuss the origins 
of the golden ratio, which, similarly to the Fibonacci numbers, is both interesting 
mathematically (it goes back to the ancient Greeks), and is also viewed by some as 
having significance beyond the mathematical. See [Hun70] for more on the golden 
ratio, from both points of view. 

The simplest way to define @ in modern terms is to view it as the unique positive 
solution of the quadratic equation 


xv —x—-1=0. (8.4.1) 


This equation does not give any indication as to the origin of the golden ratio, but it 
is the aspect of the golden ratio that we need at present. By applying the quadratic 
formula to Equation 8.4.1, it is seen that @ = 14v5 , which is approximately 1.618... 

By Theorem 2.6.11 we know that J 5 is cual and it follows that @ is caine: 


Our goal is to show that 
li Fit 
im 


n—-oo 


=¢. 


The standard “proof” of this fact, which one often sees in elementary treatments of 
the subject is, as follows. “Suppose that lim Foot = L. Then 
n—-eoo n 


n 


Fr+l — lim Fn + lim Fr+l — lim Fn + Foti 


F, n—0o E, n—0o F, n—-0o 


1+L=1+ lim 


n—-oo 


n 


Fni2 i Gr Frit ) 
= lim . 
Fi noo \ Frit Fy 


Fut F, 
= (lim 2 ) ( tim" ) = 7’. 
n—co Fu4i n—00 F, 


Hence 1 + L = L’, which is equivalent to L? — L—1 = 0. Therefore L satisfies Equa- 
tion 8.4.1. Because all of the F,, are positive then L cannot be negative, and it follows 
that L is the positive root of Equation 8.4.1, which means that L = @.” 

Before reading further, the reader is encouraged to find the flaw in the above 
proof. 

As with many incomplete proofs, the above “proof” is indeed the proof of 
something—it is simply not the complete proof of what we want to show. What 
the above proof shows is that if lim Fok exists, then it must equal @. The flaw in the 


n—-oo 


= lim 


n—-oo 


above proof is that it does not give any reason for us to believe that lim Fat exists. We 
n 


n—-oo 


will now fill in this gap in the proof by showing that { ee be is a Cauchy sequence, 
i= 


and it will then follow from the Cauchy Completensas Theorem (Corollary 8.3.16) 
that lim Fat exists. 


noo *N 
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Let € > 0. By Corollary 2.6.8 (2) there is some N € N such that + < €. By taking 
a larger value of N if necessary, we may assume that NV > 5. Suppose that n,m € N 
and n,m > N. If n =m then clearly 


Fv _ Fintt 
Fi, Fin 


=0<eE. 


Now suppose that n 4 m. Without loss of generality, assume that m > n. Then, using 
Exercise 8.4.12 (5), Exercise 2.5.8 and Exercise 8.4.12 (2) in that order, we see that 


Fr+t — Fintt | 
Fr Fin 
_ (“= “2 % (= on) front ( Fin Fin+1 )| 
F,, Fry Fv Fr42 Fin-1 Fin 
_ | Fit)? = FrtaFn | (Fnta)” —FoaFntt yy (Fn)? = Fine Fin | 
FyFn+1 Frn+iFn+2 Fin—-1F in 

=F n+2 =| n+3 | m+1 
— ( ) + ( ) peer pt (o8 ina 

FiFn+1 FrniFn+2 Fin—1Fin 
= | 1 peg (etm tt _ 

FF Fn+1Fn+2 Fin—1F in 
< : < ! < : <eé © 
a FiFn+t - Fn ~ N 


Reflections 


This section has a number of “fun” topics, for example the Cantor set and the 
Fibonacci numbers, which help show the diverse uses of sequences. The reader might 
wonder, however, about the absence of such a concentration of “fun” topics in any 
other section of this text. Are other aspects of real analysis inherently less enjoyable 
than sequences? The answer, fortunately, is no. There are many fun and useful 
applications of the material (such as derivatives and integrals) that was discussed in 
the previous chapters of this text, but the reader has undoubtedly seen such applications 
in a calculus course, and so it is not necessary to include such material here. The 
treatment of sequences in most introductory calculus courses, by contrast, is quite 
cursory, and so the reader might not have seen how valuable sequences are prior to 
studying real analysis, and hence the inclusion of the topics discussed in the present 
section. 

Moreover, in many real analysis texts sequences are treated before derivatives and 
integrals, with proofs about the latter two topics making use of sequences. In such 
texts, the importance of sequences as a tool in real analysis is immediately apparent. In 
this text, by contrast, where sequences were placed after the discussion of derivatives 
and integrals in order to emphasize the role of the Least Upper Bound Property in 
proofs about those topics, it was necessary to include some additional brief topics to 
demonstrate the value of sequences, and that presented a good opportunity for some 
topics that are particularly fun. 
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Exercises 


Exercise 8.4.1. [Used in Section 8.4.] Find an example of a function f: R — R for 
which lim f(x) does not exist, and yet there is a sequence {c,}*_, in R— {0} such 


that lim c, = 0 and lim f(c,) exists. 
n—-eoo n—-o 
Exercise 8.4.2. [Used in Corollary 8.4.2.] Prove Corollary 8.4.2. 


Exercise 8.4.3. [Used in Theorem 10.5.2.] Let J C R be an open interval, let c € J and 
let f: J — R be a function. Suppose that there is a sequence {a,},_, in R— {0} such 
that c+a, €/ for all n € N, that lim a, = 0 and that the sequence 

n—-eoo 


{ieted fOy 
n=1 


an 


is divergent. Prove that f is not differentiable at c. 


Exercise 8.4.4. Let A C R be a non-empty set. Prove that if A has a least upper bound, 
then there is a sequence {a,}>_, in A such that lim a, = lubA. 


Exercise 8.4.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
f: [a,b] = R be a function. Suppose that f is continuous. Prove that there is some 
c € [a,b] such that f(c) equals the average value of f. 


Exercise 8.4.6. [Used in Exercise 8.4.7.] Let [a,b] C R be a non-degenerate closed 
bounded interval, let f: [a,b] — R be a function, let {{an,b,|};, be a family of 
closed bounded intervals in [a,b] and let {e, };_, be a sequence in R. Suppose that 
dim én = 0, and that n € N and x,y € [ay,by] imply | f(x) — f(y)| < en. 


(1) Suppose that [a;41,b:41] C (a,b;) for all ic N. Let c € (VP, [an, by]. Prove 
that f is continuous at c. 

(2) Suppose that the hypothesis of Part (1) of this exercise is changed to the 
weaker hypothesis [a;+1,b;+1] C [a;,;] for all i € N. Does the conclusion still 
hold? Give a proof or a counterexample. 


Exercise 8.4.7. [Used in Section 3.3.] The purpose of this exercise is to prove that 
there is no function g: [0,1] — R that is continuous at every rational number in 
(0, 1], and discontinuous at every irrational number in [0, 1]. This proof is due to Vito 
Volterra (1860-1940). 

Suppose that there is a function g: [0,1] — R that is continuous at every rational 
number in [0, 1], and discontinuous at every irrational number in [0, 1]. Let s: [0,1] > 
R be the function given in Example 3.3.3 (7). We saw that s is discontinuous at every 
rational number in [0, 1], and continuous at every irrational number in (0, 1]. 


(1) Let p; € QM (0,1). Prove that there is a non-degenerate closed bounded 
interval [c1,d1] C (0,1) such that p; € (c1,d1), and that x,y € [c1,d1] implies 
|s(x) —s(y)| <1. 
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(2) By Theorem 2.6.13 (2) there is some gq; € (R— Q)N [c1,d;]. Prove that 
there is a non-degenerate closed bounded interval [a;,b;] C (c1,d1) such that 
qi € (a1,b1), and that x,y € [a;,b1] implies |g(x) — g(y)| < 1. Observe that 
x,y € [a,,b,] implies |s(x) — s(y)| < 1. 

(3) Prove that there is a family of closed bounded intervals {[an,bn]};_, in [a,b] 
such that [a;41,b:41] © (a;,b;) for all i € N, and that n € N and x,y € [an, by] 
imply |s(x) —s(y)| <2 and [g(x) —@(0)| < 

(4) Obtain a contradiction. [Use Exercise 8.4.6 (1).] 


Exercise 8.4.8. [Used in Section 8.4.] Give an alternative proof of the Nested Interval 
Theorem (Theorem 8.4.7) that relies upon Theorem 8.3.3 instead of the No Gap 
Lemma. 


Exercise 8.4.9. [Used in Section 8.3.] The purpose of this exercise is to give a proof 
of the Bolzano—Weierstrass Theorem (Theorem 8.3.9) that uses the Nested Interval 
Theorem (Theorem 8.4.7) instead of the Monotone Convergence Theorem (Corol- 
lary 8.3.4). 

Let {a,};_, be a sequence in R. Suppose that {a,};-_, is bounded. Hence there 
are s9,to € R such that so < ag < to for all k € N. We divide [so, fo] into two subin- 
tervals (so, SF] and [5 to]. It must be the case that at least one of these two 
subintervals contains a, for infinitely many k € N. Choose a subinterval for which 
this condition holds; it does not matter which one is chosen if the condition holds for 
both subintervals. Rename the chosen subinterval [s,,t1]. Then [s1,t] C [so,fo], and 
that t) — sy = ot, Continuing in this way, we use Definition by Recursion to define 
a family {[s,,t,|};,_, of non-degenerate closed bounded intervals in R such that for 
each n EN, the following three conditions hold: (1) the interval [s,,¢,] contains a, for 
infinitely many k € N; (2) [8n41,tn+1] C [Sn,tn]; and (3) thet —Sny1 = 25%. 

Use the above idea to prove the Bolzano—Weierstrass Theorem. 

[Use Exercise 8.2.9.] 


Exercise 8.4.10. [Used in Example 8.4.9 and Exercise 8.4.11.] Use the Nested Interval 
Theorem (Theorem 8.4.7) to prove that the Cantor set, as defined in Example 8.4.9, is 
uncountable. 


Exercise 8.4.11. [Used in Section 5.8.] Find an example of a function f: [0,1] — R 
such that f is integrable, but that the set of numbers at which the function is discon- 
tinuous is uncountable. [Use Exercise 2.6.13, Exercise 3.3.2 and Exercise 8.4.10.] 


Exercise 8.4.12. [Used in Example 8.4.10 and Exercise 9.5.7.] This exercise has some 
properties of the sequence of Fibonacci numbers {F;,};"_. 


(1) Prove that F, <2"! for alln EN. 

(2) Prove that F,, >n for alln € N such that n > 5. 

(3) Prove that Fj + Fo +---+F, = Fr42—1 foralln EN. 
(4) Prove that F\2?-+/%24+---+F7 = F, Fini for alln EN. 
(5) Prove that (Fni1)* — Fn42Fn = (—1)"*? for alln EN. 
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Exercise 8.4.13. [Used in Exercise 9.3.10.] For each n € N, let 


1 1 1 ' 
a eee 
Using what the reader is asked to prove below, it follows from Theorem 8.3.3 (3) 
that the sequence {y, };-_, is convergent. The limit of this sequence, which is often 
denoted y, is know as Euler’s constant (or the Euler—Mascheroni constant), and has 
been widely studied; see [Hav03] for more about this number. We will make use of 
the sequence {7,};"_, in Exercise 9.3.10, where we find the exact value of the sum of 
the alternating harmonic series. 


(1) Prove that {7,};°_, is strictly decreasing. 
(2) Prove that {y,}"°_, is bounded below by 0. 
[Use Exercise 7.2.3.] 


Exercise 8.4.14. Suppose we are given an equation of the form f(x) = 0, for some 
function f: C — R, where C C R is a non-degenerate interval. Suppose further that 
we know, for example by looking at the graph of f, that the equation f(x) = 0 has 
a solution, that is, there exists a number r € C such that f(r) = 0. If we cannot find 
the exact value of r, then the next best thing would be to approximate it. One general 
method of approximating the solution of an equation is iteration, which means starting 
with a guess for the solution, then modifying the guess in order to obtain what is 
hopefully a better approximation of the solution, then modifying that and so on. More 
precisely, we use Definition by Recursion to define a sequence {x,}>_, in C, with the 
hope that the sequence will converge to r. In practice, we never compute the whole 
sequence, but only finitely many of its terms, and, if all goes well, the value of x, for 
sufficiently large n will be a good approximation of r. 

In this exercise we discuss the Bisection Method, which gives one way of defining 
the sequence {x,};"_,. The advantages of the Bisection Method are that it is simple to 
understand and implement, and that it requires only that the function be continuous, 
not necessarily differentiable; the disadvantage is that it is slower than some other 
methods, for example Newton’s Method (discussed in Exercise 8.4.15), where slower 
in this context means that for the same amount of accuracy, more terms of the 
sequence are needed. See [Est02, Chapter 13] or [CK07, Section 3.1] for details about 
the Bisection Method. 

The Bisection Method works as follows. Suppose that f is continuous. Suppose 
further that C = [a,b], and that f(a) and f(b) are non-zero and have different signs; 
that is, suppose that f(a) > 0 and f(b) <0, or f(a) <0 and f(b) > 0. Let s; =a, let 
t; =b and let x; = an If f(x1) = 0, we have found what we are looking for, and 
we stop. Now suppose that f(x,) 40. Then f(x;) has a different sign from precisely 
one of f(s1) or f(t1). If f(x1) has a different sign from f(s), then we let sz = s; and 
to = x1; if f(x,) has a different sign from f(t;), then we let s2 = x) and ft) =f). In 
either case, we let x2 = a Continuing in this way, we use Definition by Recursion 
to define a family {[s,,tn] };-, of non-degenerate closed bounded intervals in R, and 
a sequence {x,}°"_, in R, such that for each n € N, the following four conditions hold: 
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(1) f(s) and f(t,) are non-zero and have different signs; (2) [5n41,fn+1] © [Sn5tn]; 3) 
tnt — Sn41 = 5; and (4) x, = S1™ and f(x,) 4 0. 

Prove the following theorem. 

Let [a,b] C R be a non-degenerate closed bounded interval, and let f: [a,b] — R 
be a function. Suppose that f is continuous, and that either f(a) > 0 and f(b) <0, 
or f(a) <O and f(b) > 0. Then {x,},°_, is convergent, and if r = tim x, then f(r) = 
0. [Use Exercise 8.2.9.] 


Exercise 8.4.15. Newton’s Method is a very useful method for finding approximate 
solutions of equations. (Newton did not actually propose the method as we now know 
it; the modern form of the method is due to Joseph Raphson (1648-1715) and Thomas 
Simpson (1710-—1761).)Newton’s Method is based upon iteration. Before continuing 
with this exercise, read the first paragraph of Exercise 8.4.14 for the general idea 
of iteration in the context of solving an equation; we will use the notation of that 
exercise. Newton’s Method is one way of defining the sequence {x,};"_, discussed in 
the first paragraph of Exercise 8.4.14. 

The advantage of Newton’s Method is that it is fast, in the sense of needing 
relatively few terms in the sequence to obtain the desired amount of accuracy; the 
disadvantages are that it requires that the function is twice differentiable and that it 
satisfies some additional hypotheses, and that a poor choice of x; can cause problems. 
See [Est02, Chapter 31] or [CKO07, Section 3.2] for details about Newton’s Method, 
see [HH99, Section 2.8] for a discussion of the rapidity of convergence of Newton’s 
Method and see [Fal03, Section 14.5] for the relation between Newton’s Method and 
fractals. 

Newton’s Method works as follows. Suppose that f is differentiable, and that 
f' (x) £0 for all x € C. We choose x; € C arbitrarily, though hopefully close to r. We 
then define the sequence {x,},°_, by the formula 


= fn) 
f' (Xn) 


Xn+1 =Xn (8.4.2) 
for alln € N. 

The above definition of on ae has two potential problems. First, even if x, € C 
for some n €N, it is not immediately evident that x,.,; € C, and without that it would 
not be possible to define x,,12. Second, even if x, is defined for all n €N, it is not 
evident that the sequence {x,};"_, is convergent, not to mention that it converges to a 
solution of the equation. Fortunately, we will see that if f is sufficiently well behaved, 
and if x; is chosen sufficiently close to a solution, then everything works out well. 
Our theorem is as follows. 

Let [a,b] C R be a non-degenerate closed bounded interval, and let f: [a,b] — R 
be a function. Suppose that f is twice differentiable, that f” is bounded, that f’ is 
bounded away from zero (see Definition 5.5.3), and that either f(a) > 0 and f(b) <0, 
or f(a) < Oand f(b) > 0. Then the following two facts hold. 


(a) There is a unique r € (a,b) such that f(r) = 0. 
(b) There is a non-degenerate open bounded interval J C [a,b] such that r € J, 
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and such that for any x; € J, the sequence {x,};"_, defined by Equation 8.4.2 
exists in /, and {x,};*"_, is convergent, and lim x, =r. 
n—-0o 


The proof of the theorem will be done in steps, starting with the geometric motivation 
for the formula in Equation 8.4.2 before the actual proof. 


(1) Let n € N. Suppose that x, € [a,b] has been defined. Because f’ is bounded 
away from zero, then the tangent line to the graph of f at the point (x, f(%n)) 
will intersect the x-axis. Prove that this point of intersection is x,+1. See 
Figure 8.4.2. 


Xn+1 Xn 


Fig. 8.4.2. 


(2) Prove Part (a) of this exercise. 

(3) By hypothesis on f, there is some M € R such that | f” (x)| < M forall x € [a,b], 
and there is some P > 0 such that | f(x)| > P for all x € [a,b]. We may assume 
that M > 0. Because r € (a,b), then by Lemma 2.3.7 (2) there is some 6 > 0 
such that (r—6,r+6) C (a,b). By taking a smaller value of 6 if necessary, 
we may assume that 6 < F Let J = (r—6,r+6). 


1 
Let x; € J. Clearly |x; —r| < (4) |x; —r|. Let k € N. Suppose that x, € J 


k 
has been defined, and that |x, —r| < (48) |x; — r|. Let x,41 be defined by 
Equation 8.4.2. By Taylor’s Theorem (Theorem 4.4.6), using n = 1, andx =r 
and c = x,, there is some p strictly between r and x; (except that p = x, when 


r = x,) such that 


Flr) = flee) +P 04)(0— x4) + Pr — ag) 


Prove that |xz41 —r| < a lxe—r|?. 
(4) Using Part (3) of this exercise, prove that |xz.1 —r| < 6 and that |x,,1; —r| < 
k+l 
ye) |x; —r|. It follows that x,4, € I. 
(5) It follows from Definition by Recursion, using Parts (3) and (4) of this exercise, 


n 
that {x,}/", is defined in J, and that |x, —r| < (#8) |x; —r| for alln EN. 
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Prove that {Ret 4 is convergent, and that lim x, =r, which proves Part (b) 
n—-eoo 


of this exercise. [Use Exercise 8.2.9.] 


8.5 Historical Remarks 


Similarly to the historical remarks concerning limits to infinity in Section 6.5, the 
historical remarks for the present chapter are very brief. This chapter discusses 
sequences, and limits of sequences are a type of limit to infinity, and are also a 
variation of ordinary limits of functions, and hence much of the history of the material 
in this chapter overlaps with the history discussed in Sections 3.6 and 6.5. 


Ancient World 


Zeno of Elea (c. 490-c. 425 BCE) gave four arguments, which have been preserved in 
Book VI of the Physics of Aristotle (384-322 BCE), to show that there is no motion. 
For example, Zeno’s first argument, the Dichotomy, states that for an object to move 
from one place to another, it must first pass through the midpoint, and then through the 
midpoint of what remains, ad infinitum, and so the destination will never be reached, 
presumably because it is not thought possible to traverse an infinite number of points 
in a finite amount of time; hence motion is not possible. Though this argument is 
not about sequences per se, in fact it is essentially about the sequence 5, i z i: ibs 
those lengths being the distance to each successive midpoint, and proportional to 
the time to traverse such lengths while traveling at uniform speed. From a modern 
perspective the resolution of Zeno’s argument is that the series + + i + 7 + ik +: 
is convergent, and in fact the sum of the series is 1. The second of Zeno’s arguments, 
the Achilles, is similar. Zeno’s arguments had an influence on subsequent discussion 
of infinitesimals and the infinite, and in particular it appears to have been the reason 
(or one of the reasons) that the ancient Greeks did not use limits of sequences. 
Limits of sequences arose naturally in some geometry problems such as area and 
volume calculations, where the region is approximated by a sequence of polygons or 
polyhedra with an increasing number of sides, but because the ancient Greeks did not 
take limits, they developed the method of exhaustion as an alternative way to handle 
such situations. This method, which is attributed to Eudoxus of Cnidus (408-355 
BCE), was used by Euclid (c. 325—c. 265 BCE) in the Elements, and was brought to 
its peak by Archimedes (287-212 BCE). In modern terms, the method of exhaustion 
avoids “letting n go to infinity” in the limit of a sequence by using a double proof by 
contradiction (referred to as reductio ad absurdum), which is complicated in practice, 
and has the drawback that it can only prove that a sequence converges to a number, 
but it cannot produce the number itself. Although today we can find the limit of a 
sequence directly without using the complicated arguments of reductio ad absurdum, 
it is important to observe that when we use the €—N definition of lim a, we also avoid 


n—-oo 


dealing explicitly with something called infinity (even though for the sake of brevity 
we use the symbolic notation “n — ’’), and so in a sense the modern approach to 
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limits of sequences recaptures something of what the ancient Greeks wanted. The 
modern approach, however, is much more convenient to use. 

The method of compression of Archimedes, used in area and volume problems, is 
a reductio ad absurdum method that is similar to the following modern formulation 
concerning sequences. Let s,c € R, and let {u,}>_, and {J}, be sequences in 
R with positive terms. Suppose that J, << s <u, and J, <c < up for all n € N. The 
difference form of the method of compression is equivalent to proving that s = c by 
proving that for each € > 0, the inequality u, —/, < € holds for all sufficiently large 
n € N; the quotient form is equivalent to proving that s = c by proving that for each 


Un 


a > I, the inequality 7, <@ holds for all sufficiently large n € N. 
Medieval Period 


In opposition to Aristotle, who did not believe in the existence of the infinite (but 
only in a potential infinite), and whose work was very influential in medieval Europe, 
Gregory of Rimini (1300-1358) maintained that God could create an infinite stone by 
creating equal-sized stones at each of the times ¢ = 0, 5 3, 4, .... That is, God could 
create sequences. 


Eighteenth Century 


Leonhard Euler (1707—1783) seems to have viewed sequences as he did other func- 
tions; that is, he viewed sequences as given by appropriate formulas, rather than 
as arbitrary functions with domain N, which is how we understand sequences to- 
day. From Euler’s perspective, if a sequence {a,};, is given, it would define a 
unique function f: R— R, simply by using the same formula. Carl Friedrich Gauss 
(1777-1855) viewed sequences as we do today, and in particular understood that a 
sequence does not specify a unique function f: R — R. However, Gauss’ treatment 
of sequences was not entirely rigorous, and he implicitly used some results that we 
now prove, such as the Monotone Convergence Theorem. 


Nineteenth Century 


In their attempt at providing rigorous proofs of some basic facts about continuity, 
Bernard Bolzano (1781-1848) and Augustin Louis Cauchy (1789-1857) made use of 
what we now call the Cauchy Completeness Theorem, though they could not prove it 
because they lacked the axiomatic properties of the real numbers. Bolzano did provide 
a proof that the Cauchy Completeness Theorem implied the Least Upper Bound Prop- 
erty, using the idea of bisection (similar to the method of Exercise 8.4.9). Cauchy’s 
proof of the Intermediate Value Theorem relied implicitly upon the Monotone Con- 
vergence Theorem, and explicitly on the fact that a continuous function works nicely 
with respect to convergent sequences (as stated in Corollary 8.4.2). In the 1860s Karl 
Weierstrass (1815-1897) used a bisection argument similar to Bolzano’s to prove a 
version of what we now call the Bolzano—Weierstrass Theorem for bounded infinite 
sets (rather than for bounded sequences as in the version of this theorem in the present 
text). 
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Richard Dedekind (1831-1916), using his construction of the real numbers from 
the rational numbers in Stetigkeit und irrationale Zahlen of 1872 (originally formu- 
lated in lectures in 1858), provided what was probably the first rigorous proof of the 
Monotone Convergence Theorem. Such a proof was not possible without a rigorous 
treatment of the real numbers. 


9 


Series 


9.1 Introduction 


Now that we have seen sequences of real numbers in Chapter 8, we turn to the related 
notion of series of real numbers. We have already encountered series informally, for 
example in Section 5.8, where we discussed sets of measure zero. We now give a 
rigorous treatment of series. 

A series is a sum of countably many numbers, added up in the given order. 
Although series might appear innocuous at first glance, because they seem to resemble 
finite sums, in fact infinite sums are much trickier than finite ones, as we will see, for 
example, in Section 9.4. 

One of the most important type of series is power series, which are an extremely 
useful tool in a wide range of mathematics and its applications, including numerical 
calculations. We commence our treatment of power series in this chapter, where 
we think of power series as a certain type of series with a “variable.” However, 
to make full use of power series we will need to approach them from a different 
perspective, and we will therefore put off completing our discussion of power series 
until Section 10.4. 


9.2 Series 


Intuitively, a series is the sum of a sequence. For example, if we are given the sequence 


1111 
2? 4° 8° 16’ Es 
we can form the series 
! + : + : + : + 
2 4 8 16 , 
which can also be written as 
y 1 
n=1 an 
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More generally, we have the following definition. 


Definition 9.2.1. A series in R (also called a series of real numbers) is a formal sum 


\\ an =a, +a2+4+a34+---, 


n=1 


where {an} 1 is a sequence in R. Each number a,, where n € N, is called a term of 
the series Py dn. A 


The expression “formal sum” means simply a sequence of numbers with addition 
symbols written between the terms of the sequence. Of course, while we can certainly 
write down a formal sum of infinitely many numbers, we have to ask whether such 
an infinite sum actually means anything, beyond being just symbols written on the 
page; not everything that can be written down actually means something. In particular, 
addition for real numbers has been defined for only two real numbers at a time. Using 
Definition by Recursion it is possible to extend the definition of addition to the sum of 
any finite collection of real numbers; see Exercise 2.5.19 for details. However, there 
is no direct way to extend the definition of addition of real numbers to infinitely many 
real numbers. Indeed, there ought not to be such a definition, because it is evident that 
not every infinite collection of real numbers adds up to a real number. For example, 
consider the sequence 1*,27,37,---. We can formally write down the infinite sum 
1° 4+2?+43?-+..-, but there is no hope that this sum will equal a real number. On the 
other hand, it turns out that some infinite sums of real numbers do equal real numbers. 

To obtain an intuitive feel for how an infinite collection of numbers can add up to 
a finite amount, let us view the situation backwards. Consider a unit square, as shown 
in the first row of Figure 9.2.1. The area of the square is 1. Cut the square into two 
equal pieces, each of which has height 5. and rearrange the pieces as shown in the 
second row of the figure. Each piece has area 5 and the combined area of the two 
pieces is still 1, which we can express by writing | = 5 + 5. Next, we cut one of the 
rectangles with height + into two pieces and rearrange, as shown in the third row of 
the figure; in this case the area is | = 5 + i + i In the fourth and fifth row of the 
figure we see the result of continuing this process two more times, the last one with 
area | = 5 + i + 7 + + + ig: Continuing in this fashion, it seems plausible that 


This equation turns out to be correct; we will see a proof in Example 9.2.4 (1). 
Intuitively, an infinite sum adds up to a finite amount if the terms of the series 
go to zero sufficiently fast, though it is hard to make the intuitive notion of “going 
to zero sufficiently fast’ into a rigorous definition. For our formal definition of the 
convergence of series we use a different idea, which is that to attempt to find the sum 
of a series, we can add the first two terms, then the first three terms, then the first four 
terms and so on, and see what happens as a result of this process. In this way, we 
reduce the question of the convergence of a series to a question of the convergence 
of a certain sequence. It is important to note that in this process, we always add the 
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terms of the series in the given order, rather than all at once; as we will see in our 
discussion of rearrangements of series in Section 9.4, sticking to the given order of 
the terms of a series is important. 


Fig. 9.2.1. 


Definition 9.2.2. Let )7; dn be a series in R. 


1. For each k EN, the k'® partial sum of )°7 | a,, denoted s;, is defined by s, = 
hae a;. The sequence of partial sums of 1, a, is the sequence {s,}7" |. 
2. Let L € R. The number L is the sum of )°"_| dn, written 


co 
a = L, 
1 


n= 


if the sequence of partial sums {Sah pt is convergent and lim s, = L. If 
n-oo 


Yr=1 Gn = L, we also say that )°_; dn converges to L. If )_, dn converges 
to some real number, we say that )°"_, a, is convergent; otherwise we say 
that | dn is divergent. 

3. The series )°7_; a, diverges to infinity, written 


oo 
y an = ©, 
n=1 


if lim s, =o. The series )’"_, a, diverges to negative infinity, written 
n—-oo 

an = —%, 

1 


n= 
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if lim s, = —. A 


n—-oo 


The following lemma, which states that if a series is convergent then its sum is 
unique, follows immediately from Lemma 8.2.3, and we omit the proof. 


Lemma 9.2.3. Let 7) dy be a series in R. If 7) dn = L for some L ER, then L is 
unique. 


Because of Lemma 9.2.3 we can refer to “the” sum of a series, if the sum exists. 
Example 9.2.4. 


(1) We will prove that Yr, + is convergent and )°""_, + = 1. Letk €N. Using 
Exercise 2.5.6, we see that the k'® partial sum of our series is 


k 
1 1 1 1 1 
= pe netetate +o H=l—-H. 
Sx Yia cater + ar 5k 


By Exercise 2.5.13 (2) we know that k < 2*, and hence 
1 : <sp<l 
—-<s : 
k k 


Using Example 8.2.4 (1) (2) and Theorem 8.2.9 (2), we see that jim 1 = 1 and 


lim (1 - *)= = 1. It now follows from the Squeeze Theorem for Sequences (Theo- 


n—oo 


rem 8.2.12) that lim s,, = 1. Hence, by the definition of the convergence of series, we 
n—-oo 


deduce that Yr, wn is convergent and "| + =1. 
(2) We will prove that the series Y”_, (—1)""! is divergent. Let k € N. The k 
partial sum of our series is 


1, ifkiseven 


=1+(-1)+1+--+(-D*1= 
_ oy y {. if k is odd. 


It can be proved by an €-N proof similar to the proof in Example 8.2.4 (4) that 
{sn},,—1 is divergent. However, we can avoid such a proof by observing that s, = 
5 [1 + (-1)""1] for all n € N. If the sequence {s,,}*_, is convergent, then it would 
follow from Theorem 8.2.9 that {(—1)"}"_, = {1 —2sn};_, is convergent, which is 
a contradiction to Example 8.2.4 (4). Hence {s,,}>"_, is divergent, which means that 
Y2_, (—1)""! is divergent. 


es We will prove that )~ = 2. Observe that if n € N, then 


n=1n wa = aT) =~ 
2 


a a. Let k € N. The k" partial sum of our series is 


_ {2 2 ie 2 2 x 2.2. eae 2 2 = 2 
NT 23 3 4 biel) fk 
Example 8.2.4 (2), Theorem 8.2.9 (3) and Exercise 8.2.4 imply that jim n aI = 0, and 


it then follows from Example 8.2.4 (1) and Theorem 8.2.9 (2) that fim s Syn = 2. Hence 
n—oo 


co 2 + 3 co 2; 
peer ant 18 convergent and ))7_, nGeely 
called a “telescoping series.” 


= 2. This series is an example of what is 
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(4) Let a,r € R. The series Y°_, ar”! is called a geometric series. This type of 
series is widely used in mathematics and its applications. If a = 0 then the series is 
constantly zero, which is convergent for all values of r; from now on we will assume 
that a 4 0. Writing out the terms of the series, we see that 


Yar"! =a+tar+tar’+ar+-- 


Observe that the ratio of each term in the series to the previous term is r; such a 
constant ratio characterizes geometric series. 
Let k € N. The k" partial sum of our series is 


SE= a+tar+ar?+---+ar} 
If r = 1, then clearly 5; = ka. If r #1, then Exercise 2.5.12 (3) states that 


a(1 =r) 


sS= 
l-r 


Is the sequence {s,};"_, convergent? It depends upon the value of r. First, suppose 
that r= 1. Then {s,}°_, is given by {na}*"_,, and this sequence is divergent by 
Exercise 8.2.1 (3) and Exercise 8.2.12 (3). 


Second, suppose that r £ 1. Then {s,}7, = (4 i . We saw in Exam- 


r 


n=1? 


ple 8.2.13 that the sequence {r”}""_, is convergent if and only if —1 <r < 1, and 
that if —1 <r < 1 then iim r” = 0. Using Example 8.2.4 (1), Theorem 8.2.9 and 


Exercise 8.2.12, we deduce that {se sees 7 ) - ; is convergent if and only if -Il<r<1 


(we are assuming here that r 4 1), and that if —1 <r <1 then lim s, = si 9) = 75. 
n—-eoo 


Therefore Y°_, ar"! is convergent if and only if —1 <r <1,andif -l<r<1 
thea "jar" = = 75. 

(5) We will prove that the series y°_, 4 ;, is divergent. This series is known as the 
harmonic series, and it is an interesting series even though it is not convergent; see 
the comment at the end of Example 9.3.7. This proof of divergence is due to Nicole 
Oresme (1323-1382). 

Let k €N. The k" partial sum of our series is 


me eee 
2*3 k 


There is no simple formula for s,, but it turns out that we can prove that {s,}_, is 
divergent even without such a formula. 

The hard work for our proof was done in Exercise 2.5.7, which states that ifn € N, 
then son > > It follows from Exercise 8.2.15, Example 8.2.4 (1) and Exercise 8.2.17 


that Jim ned = oo, and it then follows from Exercise 8.2.18 (1) that Jim, Son = 00, In 


pisteulac we deduce that {2 }*"_, is divergent. By Lemma 8.3.7 we see ‘that {snip 
is divergent, and hence the series )°"_, tj is divergent. 
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We need the following two very simple observations about series. First, a series 
does not have to start with n = 1. If k € Z, we can just as well consider series of the 
form )y; dn. For simplicity, we will state all results about series starting with n = 1, 
except in some situations where starting with n = 0 is more convenient. Second, the 
convergence of a series is unaffected by changing, or dropping, finitely many terms of 
the series; see Exercise 9.2.4 and Exercise 9.2.5 for details. 

The series seen in Example 9.2.4 are actually atypical, in that we were able to 
find nice formulas for the partial sums for most of these series. In general, however, it 
is not possible in practice to find useful formulas for the partial sums of most series, 
and hence other, more indirect, tests for convergence are used. There are a variety 
of such tests, some of which will be seen in Sections 9.3 and 9.4. The simplest such 
test, which we now state, is one that yields only divergence, not convergence. The 
idea of this test is that if a series is convergent, then the sequence of partial sums is 
convergent, and hence the partial sums get closer and closer to each other, and hence 
the terms of the series must go to zero. 


Theorem 9.2.5 (Divergence Test). Let Y*_| ay be a series in R. If {an},,_, does not 

converge to 0, then Y_; dy is divergent. 

Proof. Suppose that Y*_; a, is convergent. Let L = Y?_, dn. Let {s,}/, be the 

sequence of partial sums of °°, a,. Then lim s, = L. Let € > 0. Then there is 
n—-co 


some N € N such that n € N and n > N imply |s, — L| < §. Suppose that n € N and 
n>N-+1.Thenn—1 >N, and therefore 


E E 
lay — O| = |S — Sp] = |\Sn ~-L +L —Sy_4| < |5p —L| + |L—5p-1| < ae =€. 


It follows that lim a, = 0. 


n—-oo 


The Divergence Test (Theorem 9.2.5), though quite simple, is often the source 
of student error in calculus courses, because of the failure to distinguish between the 
contrapositive and the converse of a statement. The contrapositive of the Divergence 
Test says that if ), a, is convergent, then fee converges to 0, and that is 
certainly true. However, it is definitely not the case that if {a,};°_, converges to 0, 
then )°°_, a, must be convergent, as seen in Example 9.2.4 (5). 

We conclude this section with the following basic result about the convergence of 
series. 


Theorem 9.2.6. Let Yr, a, and V7, by be series in R, and let k € R. Suppose that 


Yre1 an and V1 bn are convergent. 


1. YP, (an + bn) is convergent and Y”_) (an t+bn) = Vr Qn FY On. 
2. ¥_1 (an — bn) is convergent and Y"_) (an — bn) = Ve An — Ly On. 
3. YF) kan is convergent and Yr, kan =kYF_ 1 an. 
Proof. We will prove Part (1); the other parts are similar, and we omit the details. 


(1) Let {s,}7,, and {t,}°°, and {uw}; be the sequences of partial sums of 
Yr an, and Yr, by and YF, (an + bn), respectively. Because Pj ay and Yr, bn 
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are convergent, then {s,,}>_, and {t,}°_, are convergent, and lim s, = )7_, a, and 
n-oo 

lim t, = YF, bn. Using the Associative and Commutative Laws for Addition, which 

n—0oo 


work for all finite sums, we see that 


uz = (a; +b;) ay + Yaa otn 


t=1 i=l i=1 


for all k € N. It now follows from Theorem 8.2.9 (1) that 


lim u, = jim n (Sn +ty) = tim, Sat tim i, = y Gy + 3 bn. 


n—-eoo 
n=1 n=1 


Hence "| (dn + bn) is convergent and 1?) (an +bn) = Vr an FL On. 


The reader might wonder why there is no mention of products and quotients of 
series in Theorem 9.2.6. The answer is that they do not work out as nicely as sums and 
differences. We will not discuss quotients of series in this text, because there is no nice 
formula for such quotients even for nicely behaved series, but we will discuss products 
of series very briefly here, and in more detail in Section 9.4 when we have an additional 
concept at our disposal. Consider the equation Y*_, (an + bn) = VP) an +L On 
from Theorem 9.2.6 (1). We want to look at the analog for products of series of 
each side of this equation. Suppose that )_, a, and )"_, by are convergent series. Is 
yr-1 Inn necessarily convergent? The answer, as seen in Example 9.2.7 below, is 
in general no. In Exercise 9.3.5 (1), it is seen that if a, > 0 and b, > 0 for alln €N, 
then )°_; dnbn is convergent. However, as seen in Part (3) of that exercise, even if 
Y 71 Gnbn is convergent, it is not necessarily equal to [L"_, dn] + [L_| bn]. We then 
ask, separately from the question of the convergence of ))_| dnbn, whether there is a 
nice formula for [7-1 Qn]: [7-1 bn]? As seen in Section 9.4, it turns out that in some 
cases the answer is yes, and in other cases the answer is no. 


Example 9.2.7. In Exercise 9.4.1 it will be seen that the series Y°_, (—1)""! ai is 
Convereent If we take both Ya, and Yr, bn to be this series, then) | dnbn = 
eae >» and we saw in Example 9.2.4 (5) that edie ; 1s divergent. Also, the series 


Ln=1 fF = Ln=1 1 is divergent. .) 


Reflections 


In a typical second-semester calculus course, part of the semester is devoted 
to “sequences and series.” In practice, sequences are discussed only insofar as they 
are needed to define the convergence of series via partial sums, and to some extent 
series are discussed only insofar as they are needed to determine the intervals of 
convergence of power series. That approach to series is appropriate for such courses, 
because from the applied point of view of introductory calculus courses it is power 
series, as opposed to series in general, that is the most important tool for applications. 
For example, power series are useful in finding solutions of differential equations. 
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From the perspective of real analysis, on the other hand, while we still use series in 
the service of power series, series are studied in their own right as well, there being 
some very interesting and surprising results about series, for example Theorem 9.4.15 
about rearrangements of conditionally convergent series. 


Exercises 


Exercise 9.2.1. Prove that 7, GacTVGndD) is convergent, and find its sum. 


Exercise 9.2.2. [Used in Exercise 9.3.2, Lemma 9.5.3 and Theorem 10.4.4.] Let 4 dn 
be a series in R. Prove that if Y°_; a, is convergent, then {nha is bounded. 


Exercise 9.2.3. [Used in Example 10.4.6.] Let )°_, a, be a series in R, and let k € R. 
Suppose that k # 0. Prove that if °°, a, is divergent, then )°_, ka, is divergent. 


Exercise 9.2.4. [Used in Section 9.2 and Exercise 9.5.5.] Let )_, dn be a series in R, 
and let k € N. Prove that )°""_,a, is convergent if and only if Y_, an is convergent. 


Exercise 9.2.5. [Used in Section 9.2 and Theorem 9.3.2.] Let Ya, and Y_, by, be 
series in IR. Suppose that there is some P € N such that n € N andn > P imply a, = Dy. 
Prove that )°""_| a, is convergent if and only if )_; bn is convergent. 


Exercise 9.2.6. [Used in Theorem 10.4.18.] Let )_; ay and Y"_, bn be series in R. 
Suppose that a, < by for all n € N, and that Y_, a, and Y"_, by, are convergent. 


(1) Prove that Pr dn < Vy On. 
(2) Suppose that ay < bp for some p € N. Prove that Yr) dn < Lp Dn. 


Exercise 9.2.7. [Used in Example 9.4.11.] Let )7., dn be a series. Suppose that 
yr-1 Gn is convergent. Prove that the series 


O+a,+0+a.+0+a3+0+a4+0+--- 


is convergent, and that it has the same sum as 1; ay. This result might seem obvious, 
but a proof is needed. [Use Exercise 8.3.6.] 


Exercise 9.2.8. Let )\", dn be a series in R. 


(1) Prove that if °°, a, is convergent, then )*_, (dy, — dy+1) is convergent. 
(2) If P| (an — Gn41) is convergent, does that necessarily imply that Y°_, ay is 
convergent? Give a proof or a counterexample. 


Exercise 9.2.9. [Used in Section 9.6.] It was proved in Example 9.2.4 (5) that the 
harmonic series is divergent. The purpose of this exercise is to give another proof 
of that fact. This proof is due to Pietro Mengoli (1626-1686). Let on oe be the 
sequence of partial sums of the harmonic series. 


(1) Prove that fa + t + wy > 3 for all n € N such that n > 2. 
(2) Prove that 53,41 >1+5, foralln €N. 
(3) Suppose that {Sa }n-t is convergent, and derive a contradiction. 
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Exercise 9.2.10. Let r € R. Suppose that |r| < 1. Does the series 7) yom 


or diverge? Justify your answer. 


converge 


Exercise 9.2.11. [Used in Exercise 9.3.11, Theorem 9.4.7 and Theorem 9.4.12.] Let 
Yr-1 Qn be a series in R. Prove that )}"_, dn is convergent if and only if for each € > 0, 
there is some N € N such that n,m € N andn > m > N imply ee seict ay,| <€. 


Exercise 9.2.12. [Used in Theorem 9.4.15.] Let )\"_| dn be a series in R, and letk € N. 
Suppose that 2°, a, = 0. Prove that 2°, dy =. 
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As we saw in Section 9.2, a series is convergent, by definition, if the sequence of 
partial sums of the series is convergent. In practice, however, it is often very difficult 
to find an explicit formula for the partial sums of a given series, and therefore it is not 
always possible to verify whether a series is convergent or not by appealing directly 
to the definition of convergence. Fortunately, there are a number of convergence 
tests that can be used to determine whether or not various series converge. As with 
techniques of integration, no one convergence test treats all series, and in practice, for 
a particular series, one has to examine the various convergence tests to decide which 
one seems most likely to be helpful in the given situation. 

The disadvantage of the various convergence tests is that they tell us only whether 
or not a series is convergent, but not what the sum of the series is if it is convergent. 
However, knowing that a series is convergent in principle, even without knowing the 
sum, is better than not knowing anything about the convergence of the series. 

In this section we will see some well-known convergence tests. Most of these 
tests concern series with non-negative terms. At the heart of our treatment of series 
with non-negative terms is the following lemma, which is very easy to prove given 
our previous work. 


Lemma 9.3.1. Let P| dy be a series in R. Suppose that a, > 0 for alln € N, or 
that a, <0 for alln €N. Let et be the sequence of partial sums of "| Gn. 
Then ¥-_| dn is convergent if and only if {sn}, , is bounded. 


Proof. Suppose that a, > 0 for all n € N. Then the sequence {s,,};"_, is increasing, 
and the lemma then follows immediately from the Monotone Convergence Theorem 
(Corollary 8.3.4). The other case is similar, and we omit the details. 


We start our discussion of convergence tests with the Comparison Test, which is 
perhaps the easiest convergence test to use. The idea of this test is that if a series with 
non-negative terms is known to be convergent, then any other series with non-negative 
terms, and which is term-by-term no greater than the convergent series, will also be 
convergent. 


Theorem 9.3.2 (Comparison Test). Let >) a, and Vy, by be series in R. Suppose 
that ay, > 0 and by, > 0 for alln € N, and that there is some N € N such thatn € N 
and n> N imply ay < by. 
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L. If YP) bn is convergent, then V7) ay is convergent. 
2. If V1 Gn is divergent, then Y~_, by is divergent. 


Proof. We will prove Part (1); Part (2), which is the contrapositive of Part (1), then 
follows immediately. 


(1) Suppose that )_; b, is convergent. 

First, suppose that N = 1. Then a, < by for alln € N. Let {s,}7, and {t}7 | 
be the sequences of partial sums of Y_, a, and Y?_; bn, respectively. Then hel 4 
is convergent. It follows from Lemma 8.2.6 that {t,};_, is bounded. Hence, there 
is some M € R such that |t,| < M for all n € N. Because 0 < ay, < by for alln EN, 
it follows that 0 < s, < t, for all n € N. Hence |s,| < M for all n € N, which means 
that {s,}"°_, is bounded. Lemma 9.3.1 then implies that {s,}*", is convergent, and 


therefore )_; a, is convergent. 


Second, suppose that N > 1. Let {cy}, be defined by 
0, ifn<N 
Ch = : 
an, ifn>N. 


Then 0 < cy < by for all n € N. Using the previous paragraph, we see that "| Cn is 
convergent. Exercise 9.2.5 then implies that °°, a, is convergent. 


oe 9.3.3. In Example 9.2.4 (5) we saw that Y_ i ;, is divergent. Clearly + < 
ba for all n € N, and therefore by the Comparison Test (Theorem 9.3.2) it follows 


that y*_, —.. is divergent. On the other hand, we ae use the eae Test 
n=! n—0.3 


to evaluate the convergence or divergence of ))7_| 5 ts 3, because | wt 3 <5 ! for all 
n EN, and that inequality is “the wrong way” to use the Comparison Test. That i is, the 
Comparison Test says that a series that is term-by-term smaller than a convergent series 
is convergent, and that a series that is term-by-term greater than a divergent series 
is divergent, but it does not say anything about a series that is term-by-term smaller 
than a divergent series, or term-by-term greater than a pony erent series. On the other 


hand, it would seem intuitively reasonable that the series a03 3 ought to behave 


similarly to P| 1. because as n becomes very large the 0.3 becomes negligible. In 
fact, these two series do behave similarly; it is simply that the Comparison Test does 
not help us in this situation. © 


The following convergence test allows us to evaluate the convergence of the series 
in Example 9.3.3 for which the Comparison Test (Theorem 9.3.2) does not apply. 
This new convergence test also involves a comparison between two series, but it 
does not require a hypothesis as strict as a, <b, for all n € N such that n is greater 
than or equal to a given integer. The idea of this convergence test, called the Limit 
Comparison Test, is that if the ratios of the terms of two series behave nicely as n 
goes to infinity, then the two series have comparable convergence properties. 

The statement of the following theorem makes implicit use of the fact that if 
{cn};,_, is a sequence in R, and c, > 0 for alln € N, and {c,};"_, is convergent, then 
lim c, > 0, which follows from Example 8.2.4 (1) and Theorem 8.2.11. 


n—-oo 


9.3 Convergence Tests 453 


Theorem 9.3.4 (Limit Comparison Test). Let | a, and V7, b, be series in R. 
Suppose that a, > 0 and b, > 0 for alln € N, and that {qu \" is convergent or it 


n= 
diverges to infinity. Let L= lim *. 


n—0o On” 


1. Suppose that L € (0,°°). Then Y_, ay is convergent if and only if Y°_ by is 
convergent. 

2. Suppose that L=0. If V1 bn is convergent then Y_, dn is convergent. 

3. Suppose that L=~. If Y*_| by is divergent then V7, ay is divergent. 


Proof. We will prove Part (1), leaving the rest to the reader in Exercise 9.3.2. 


(1) Because L € (0,-c), then $ > 0. Hence there is some N € N such that n € N 


and n > N imply |" —L| < £. Hence n € N and n > N imply a < dn < < Sel By 


the Comparison Test (Theorem 9.3.2) it follows that if Y_; a, is convergent then 


p nea bol is convergent, and that if y°_, 3 a is convergent then )\""_, dn is convergent. 


However, we can use Theorem 9.2.6 (3) to see that Yr, byk and )r 1 eee are 
each convergent if and only if )"_, b, is convergent. It now follows that re =| In IS 
convergent if and only if )_, by, is convergent. 


It is seen in Exercise 9.3.3 that Theorem 9.3.4 (2) (3) cannot be made into if and 
only if statements. 


Example 9.3.5. In Example 9.3.3 we conjectured that )_, — is divergent, though 
we did not yet have the tools to prove it. We can a prove this result by applying 
the Limit Comparison Test (Theorem 9.3.4) to Pray 5 L and Y= Using Exam- 
ple 8.2.4 (1) (2) and Theorem 8.2.9 (1) (3) we see that 


1 
L= lim — = tim #22 = tim (1422) =1. 


1 
n=1 n+0.3° 


It eile from the Limit Comparison Test that 77-1 = an z is ee if and only 
if Y= 42 ; is sic ig We saw in Example 9.2.4 (5) that P77 
hence Y” 


is divergent, and 
z is divergent. © 


n= it n 
n=1 03 

The Comparison Test (Theorem 9.3.2) and the Limit Comparison Test (Theo- 
rem 9.3.4) are very easy to use, but they have one major drawback, which is that to 
show that one series is convergent, we need to find another series for comparison 
whose convergence is known, and finding that other series for comparison is not 
always easy in practice. We now turn to another convergence test, called the Integral 
Test, which has limited use, but is very effective in some situations, and it does not 
require a second series for comparison. The Integral Test relies upon Type | improper 
Integrals, as discussed in Section 6.4. The idea of this convergence test is seen in 
Figure 9.3.1. If the terms of a series )’"_, dn are non-negative and decreasing, and if a 
well-behaved function f: [1,¢¢) + R can be found such that f(n) = ay for alln € N, 
then the value of the improper integral /;° f(x) dx is closely related to the sum of the 
areas of the rectangles shown in the figure, and that sum is precisely | dn, because 
all of the rectangles have width 1. 
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12 3 4 5 6 7 


Fig. 9.3.1. 


Theorem 9.3.6 (Integral Test). Let | ay be a series in R, and let f: [1,0°) > R 
be a function. Suppose that f is continuous and decreasing, that f(x) > 0 for all 
x € [1,°°), and that f(n) = a, for alln € N. Then V_, an is convergent if and only if 
Sv f(x) dx is convergent. 


Proof. Because f(x) > 0 for all x € [1,°0), then a, = f(n) > 0 for all n € N. Let 
qb , be the sequence of partial sums of "| dn. It follows from Lemma 9.3.1 that 
Lr] Gn is convergent if and only if {s,};"_, is bounded. 

Because f is monotone, it follows from Exercise 5.4.12 that f is locally integrable. 
Let F : [1,00) — R be defined by F(x) = J; f(¢) dt for all x € [1, 00). By Exercise 5.6.4 
we see that F is increasing. Hence {F(n)}°_, is increasing. It follows from Exer- 
cise 8.3.3 that tim F (x) exists if and only if the sequence {F(n)}*_, is convergent. In 


co 


n=1 


other words, the improper integral [;° f(x) dx is convergent if and only if {F (n) } 
is convergent. 

By Theorem 5.3.2 (1) we know that F'(n) > 0 for all n € N. Because {F(n)}-_, 
is increasing, the Monotone Convergence Theorem (Corollary 8.3.4) implies that 
{F(n)};-_, is convergent if and only if it is bounded. Hence {7° f(x)dx is convergent 
if and only if {F(n)}*_, is bounded. 

We now show that {s,, |", is bounded if and only if {F(n) }*°_, is bounded, which, 
together with what we saw above, will prove the theorem. 

Let k € N. Because f is decreasing, then f(k) > f(x) > f(k+1) for all x € 
[k,k-+ 1], and hence by Theorem 5.3.2 (3) we see that f(k+1)-1< ff"! f(x)dx < 
f(b) 1. 

Let n € N. Then 


n—1 n—1 n—1 


rrensy [pears se. 
i=l ="! i=l 
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By Theorem 5.5.7, extended to n subintervals by induction, it follows that 


n—1 


= LAD < ff yas E10. 


n 
LS 


Hence 
Sn ay < F(n) <Sn-1, 


and it follows that 
Sn <F(n)+a, and F(n) < Spy. 


We deduce that {s, };"_, is bounded if and only if {F(n)}""_, is bounded. 


Example 9.3.7. — p€R. The series Yr, 4 is called a p-series. Whether or not 
the series Yn 3 1" is convergent depends Hil the value of p. 

First, suppose that p = 1. Then >’; + 
shown to be divergent in Example 9.2.4 om 

Second, suppose that p =0. Then Yr, = is the constant series )."_, 1, which is 
divergent by the Divergence Test (Theorem 9.2.5). 

Third, suppose that p > 0 and y # 1. We will use the Integral Test (Theorem 9.3.6) 
to test the convergence of Y*_, r- Let f: [1,00) — R be defined by f(x) =x? 
for all x € [1,0c). Then f(n) = 4 for all n € N. We see by Definition 7.2.11 that 
f(x) > 0 for all x € [1,0¢). By Theorem 7.2.13 (1) we know that f is differentiable, 
and f’(x) = —px-?-! <0 for all x € (1,0). It follows from Theorem 4.2.4 that f is 
continuous, and it follows from Theorem 4.5.2 (3) that f is strictly decreasing. We 
have therefore verified that the Integral Test 1s applicable, and it follows from that 
convergence test and Exercise 6.4.1 that )"_ aa is convergent if p > 1, and divergent 
ifO0<p<l. 

Fourth, suppose that p < 0. Then —p > 0, and therefore by Exercise 7.2.16 we 
see that jim x"? = oo, It follows from Exercise 8.2.10 (2) that jim 4 = im n’? = 00, 


is the harmonic series 1. which was 


The Divergence Test (Theorem 9.2.5) then implies that Yn 1 + is divergent. 


Putting the various cases together, we see that )"_, — is convergent if and es 


if p > 1. For example, the series 7’, + is convergent, and the series Y*_, + aa 


n= iz 


divergent. Moreover, we see that among the p-series, the harmonic series is ae 
at the “boundary” of divergence and convergence, and it is therefore an interesting 
series even though it is divergent, and it is a useful series for the Comparison Test 
(Theorem 9.3.2) and the Limit Comparison Test (Theorem 9.3.4). % 


For the final convergence test of this section, we look at series with alternating 
positive and negative terms. The idea of this convergence test is as follows. Suppose 
that we have a series of the form )°"_, (—1)*""a,, where we assume that a, > 0 for 
alln EN, that {a,}*_, is decreasing and that tim an =0. Let {s,}°_, be the sequence 


of partial sums of )_, s,. Then s} = a1, and sz = aj — az is less than or equal to s1, 
and s3 = a; — d2 +43 is greater than or equal to sz but less than or equal to s; and so 
on, as seen in Figure 9.3.2. It then appears, intuitively, as if the terms of {s,};_, are 


getting closer and closer to something, which we will indeed prove using the Nested 
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Interval Theorem (Theorem 8.4.7). Also, we note that whereas the following theorem 
is stated for series of the form Y*_, (—1)"~!ay, the analogous result also holds for 
series of the form Y*_, (—1)"an. 


= 
OOOO eee 
a3 
——————_————_—— 
ee 
—.. $$ o_o 
So S4 S5 53 S| 
Fig. 9.3.2. 


Not only is the following convergence test different from the previous convergence 
tests in that it is not about series with non-negative terms, but it has another distinctive 
feature. In general, when we prove that a series is convergent by using a convergence 
test, for example the Comparison Test (Theorem 9.3.2) or the Limit Comparison Test 
(Theorem 9.3.4), we do not learn from the convergence test what the sum of the series 
equals. In the case of alternating series, however, although the following convergence 
test does not tell us the exact value of the sum of the series, Part (2) of the theorem 
gives us a way to estimate the sum of the series very easily. For this reason alternating 
series are particularly convenient to work with. 


Theorem 9.3.8 (Alternating Series Test). Let {a,}\"_, be a sequence in R. Suppose 
that dy, > 0 for alln € N, that {an}; , is decreasing, and that lim ay, = 0. 


om n—0o 


1. The series Y°_, (—1)"~!an is convergent. 
2. Let L=Y°_, (—1)""lan, and let {s,}"_, be the sequence of partial sums of 
Ye (—1)" lan. Then |L—sy| < an41 for alln EN. 


Proof. We prove both parts of the theorem together. Let m € N. By hypothesis 
we know that dam > dam41 = A2m+2 > 0. Because $241 = $2m—1 — A2m + A2m+41 and 
SIm42 = S2m+1 — 42m42 = 82m + A2m+1 — A2m+2; it follows that Sam < S2m42 S Samy S 
52m—1. We therefore see that {[s2n,52n—1]};_, is a family of closed bounded intervals 
in R such that [soc-+1)»S2(41)-1] C [s0;, 2:1] for all i € N. We know by hypothesis 
that lim a, = 0, and it then follows from Lemma 8.3.7 that {aon }r_1 is convergent 


n—0oo 
and that lim a2, = 0. Hence lim (s2,_1 — $2) = lim a2, = 0. 
n—0co n—-0oo 


n—-oo 


We can now apply both parts of the Nested Interval Theorem (Theorem 8.4.7) 
to {[s2n,52n—-1]},—1 to deduce that (\"_; [S2n,S2n-1] = {c}, where c = lim sy, = 
lim Son—1- 
n—co 

By Exercise 8.3.6 it follows that {s, } 


co 


,; 1s convergent and lim s, = c. Hence 
n-oo 


n= 
LY, (—1)""! ap is convergent, and Y°_, (—1)""!a, = c. To conform to the notation 
of Part (2) of the theorem, we rename c as L. 
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For each k € N, we know that L € [s2442, 82x41] © [82x 824-1], and hence so, < 
S242 SL < sop < S241. Letn EN. Ifn = 27 for some j € N, then 82) <L< sj, 
and hence |L—s,| = |L—s2j| < |s2j — $2j41| =42j41 =4n41. fn =2j—1 for some 
JEN, then sz; < L < s2j;-1, and hence \L—s,| = |L—s2j-1| < |$2j-1 —s9;| = a2; = 
Gn+1- In either case, we deduce that |L—s,| < dn+1. 


Example 9.3.9. We will prove that the series )_, (-1)rl2 is convergent. This 
series is known as the alternating harmonic series. It is evident that i > 0 for all 
n €N, and that {1 is decreasing. We saw in Example 8.2.4 (2) that lim : =0. 


We can therefore use Part (1) of the Alternating Series Test (Theorem 9.3.8) to deduce 


that y°_, (—1)""!+ is convergent. 


Let L=y*_,(-1)*"! ‘, Suppose that we want to estimate the value of L to within 
two decimal places. That is, we want to find a real number which is no farther from 
L than 0.005. Let {s,}°_, be the sequence of partial sums of Y°_, (—1)’"!1. By 


n 
Part (2) of the Alternating Series Test we see that |L — s29)| < xo <4 x00 = 0.005. 
Hence s29; is the desired approximation of L, and a simple numerical calculation 
yields 
1 1 1 


i 
a -” ~ 0.69. 
so = 7a tz Fag] 


The precise value of L is given in Exercise 9.3.10, using a slightly devious method. © 


Reflections 


The various convergence tests in this section should be familiar to the reader 
from a calculus course, and they have been included here so that the reader can see 
that this material is susceptible to a rigorous treatment. From the point of view of 
the overall development of real analysis, however, these tests, while very useful for 
concrete computations with series, are not as important as the material in the sections 
preceding and following the present one. On the other hand, the convergence tests in 
the present section are useful for producing examples involving series, and even the 
most detailed treatment of real analysis, or any part of mathematics, that is lacking 
nice examples is hard to understand, or worse. 


Exercises 


Exercise 9.3.1. [Used in Exercise 9.3.6, Exercise 9.3.7 and Lemma 9.4.14.] The purpose 
of this exercise is to refine the statement of Lemma 9.3.1. Let "| dn be a series in 
R. Suppose that a, > 0 for all n € N. Let {s,}/°_, be the sequence of partial sums of 
peal an. 


(1) Prove that either | a, is convergent or P| dy = ©. 

(2) Prove that if there is some M € R such that s, < M for alln € N, then "| an 
is convergent and | dn < M. 

(3) Prove that )*_, a, is convergent if and only if {s,,};_, has a bounded subse- 
quence. [Use Exercise 8.3.8.] 
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Exercise 9.3.2. [Used in Theorem 9.3.4.] Prove Theorem 9.3.4 (2) (3). 
[Use Exercise 9.2.2.] 


Exercise 9.3.3. [Used in Section 9.3.] Give examples to show that Theorem 9.3.4 (2) 
(3) cannot be made into if and only if statements. 


Exercise 9.3.4. Let )_, a, be a series in R. Suppose that a, > 0 for all n € N. Prove 


that if Ya» is convergent then Y*_, (42,1 +42n) is convergent. 


Exercise 9.3.5. [Used in Section 9.2.] Let | dn and )}"_, by be series in R. Suppose 
that a, > 0 and b, > 0 for alln EN. 


(1) Prove that if )"_, a, and Y°_; by are convergent, then )7_; dnbn is conver- 
gent. 

(2) In order to guarantee that Y_; a,bn is convergent, is it necessary that both 
Yr-1 dn and Y_, by are convergent? If yes, explain why. If not, what weaker 
hypotheses on Yr, dy and )°"_, b, would suffice? 

(3) Find an example of series 7", c, and Y°_; d, in R such that c, > 0 andd, >0 
for all n € N, that Y*_,c, and)” _,d, are convergent and that )_) cnd, # 
ena Cn) * nmi An]. 

Exercise 9.3.6. [Used in Lemma 5.8.3.] Let {2° a4 be a sequence of series in 
R, and let °°, by be a series in R. Suppose that ak > 0 for alln,k EN, that Pr, bp is 
convergent and that for each k € N the series Y*_, aX is convergent and Y*_, ak < by. 
Let f: N— NxN be a bijective function. Such a function exists because N x N 
is countably infinite, which is a standard fact about the cardinality of the number 
systems; see [Blo10, Sections 6.5—-6.7] for details. Let Y_, cn be defined as follows. 
For each i € N, we have f (i) = (;,k;) for some n;,k; € N, and then let c; = ani. Prove 
that Yr; cy is convergent and Py cy < Vy On. [Use Exercise 9.3.1 (2).] 


Exercise 9.3.7. The purpose of this exercise is to prove and apply a convergence test 
known as the Cauchy Condensation Test. This convergence test generalizes the idea 
used in Example 9.2.4 (5), where we saw that the harmonic series is divergent by 
looking at partial sums of the form sv. 


(1) Let )*_, a, be series in R. Suppose that a, > 0 for all n € N, and that {a,}7, 
is decreasing. Prove that )"_, dn is convergent if and only if P| 2”a2» is 
convergent. [Use Exercise 9.3.1 (3).] 

(2) Let p € R. In Example 9.3.7 we saw that Yr, 4 is convergent if and only 
if p > 1. Give an alternative (and simpler) proof of this fact using Part (1) of 
this exercise. 


Exercise 9.3.8. [Used in Section 2.8 and Exercise 9.3.9.] In Section 2.8 it was proved, 
using the Least Upper Bound Property but not series, that every real number has a 
unique base p representation, where p is a natural number greater than 1. Now that 
we have the use of series at our disposal, the material in Section 2.8 can be simplified 
in a number of ways. You can do the present exercise even if you have not read 
Section 2.8. 
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Let p € N. Suppose that p > 1. Let {a, };"°_, be a sequence in {0,..., p— 1}, where 
we are using the notation of Definition 2.5.3. We can replace Definition 2.8.4, which 
uses least upper bounds, with the definition of Y*_, aip~! as a series. 

The purpose of this exercise is to provide simplified proofs of Lemma 2.8.3 and 
Lemma 2.8.5 using what we have learned about series. 


(1) Prove that Y°° , a;p~' is convergent. 

(2) Prove that Y°., a;p~' = lub{Y_, ap! |n € N}. 

(3) Prove thatO0<Y*,ajp-' <1. 

(4) Prove that );" a;p~' =O if and only if a; = 0 for alli CN. 

(5) Prove that °°, aip~! = 1 if and only if a; = p—1 for alli EN. 

(6) Let m € N. Suppose that m > 1. Prove that re ,aip i> baat ‘aip—', where 
equality holds if and only if a; = 0 for all i € N such that i > m. 

(7) Let m € N. Suppose that m > 1, and that a;,_1 4 p — 1. Prove that 


m—2 
;, In-1 +1 
: dip ‘< L aiP iis pl a 


where equality holds if and only if a; = p—1 for all i € N such that i > m. 


Exercise 9.3.9. [Used in Section 2.8.] This exercise makes use of Exercise 9.3.8. Read 
the statements of all the definitions, lemmas and theorems in Section 2.8, though not 
the proofs. Then read the proof of Theorem 2.8.10. Using the definition of Y*., ap! 
as a series as given in Exercise 9.3.8, simplify the proof of Theorem 2.8.10 as much 
as possible. 


Exercise 9.3.10. [Used in Example 9.3.9 and Example 9.4.11.] It was seen in Exam- 
ple 9.3.9 that the alternating harmonic series )7_, (—1)"”~ y , is convergent, and an 
estimate was given for the sum of the series. In this exercise we make use of Exer- 
cise 8.4.13 to find the exact value of the sum of this series. 

Let {s,}"_, be the sequence of partial sums of £*_, (—1)""!1. Let {y:}7_, be 
the sequence defined in Exercise 8.4.13. It was proved in that exercise that {y,}/", is 
convergent. 


(1) Prove that s2, = Yon — % +1n2 for all n € N. 
(2) Prove that P*_, (—1)""!4 = 1n2. 


Exercise 9.3.11. Let {a,};"_, and {b, };"_,; be sequences in R. Suppose that a, > 0 for 


alln EN, that {an} 1 is decreasing, that lim a, = 0 and that the sequence of partial 
n—co 

sums of the series )}°_, b;, is bounded. Prove that the series )_, dnb, is convergent. 

This result, known as Dirichlet’s Test, is a generalization of the Alternating Series 

Test (Theorem 9.3.8). [Use Exercise 5.7.6 (1) and Exercise 9.2.11.] 


9.4 Absolute Convergence and Conditional Convergence 


The first question to be asked about a series is whether or not it is convergent, but 
it turns out that even among convergent series there are differences in the nature of 
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the convergence. We know from the Divergence Test (Theorem 9.2.5) that if a series 
is convergent, then the terms of the series go to zero. Intuitively, the distinction that 
we need to make among convergent series is that for some series the terms go to 
zero so rapidly that even the series of absolute values of the terms is convergent, 
whereas for other series the terms go to zero more slowly, and the series is convergent 
only by virtue of cancellation between positive and negative terms. We do not have a 
formal way of defining how rapidly the terms of the series go to zero, but we can still 
distinguish between these two types of convergent series as follows. 


Definition 9.4.1. Let 7"; dn be a series in R. The series )"_, ay is absolutely con- 
vergent if )°*; |a,| is convergent. The series )’°_| a, is conditionally convergent if 
Yr-1 An iS convergent but not absolutely convergent. A 


Example 9.4.2. 


(1) For a series with non-negative terms there is no difference between conver- 
gence and absolute convergence, and hence any convergent series with non-negative 
terms, for example the series in Example 9.2.4 (1), is absolutely convergent. 


(2) We saw in Example 9.3.9 that the alternating harmonic series )°7_ (—1)""! u 


n 


is convergent. On the other hand, the series )*_, |(—1)""!4| = D*_, + is the har- 
monic series, which was shown to be divergent in Example 9.2.4 (5). Therefore 


ae (-1)"1h is not absolutely convergent, but it is conditionally convergent. © 


We saw in Example 9.4.2 (2) that a series can be convergent but not absolutely 
convergent, and hence such a series is conditionally convergent. On the other hand, 
we see in the following theorem that if a series is absolutely convergent then it is 
convergent. 


Theorem 9.4.3. Let Y>_) dy be a series in R. If Y7_1 an is absolutely convergent, 
then V1 Gy is convergent. 


Proof. Suppose that )*_, a, is absolutely convergent. Then the series )"_, |a,| is 
convergent. It follows from Theorem 9.2.6 (3) that )*_, 2|a,| is convergent. 

Let )"_ | bn be defined by by, = |an|— ay for alln € N. It is straightforward to verify 
that 0 < by < 2|a,| for alln € N. Because Y*_, 2|a,| is convergent, it follows from the 
Comparison Test (Theorem 9.3.2) that )°_; by, is convergent. Because dy, = |an|— dp 
for all n € N, Theorem 9.2.6 (2) implies that )°_, a, is convergent. 


Rather than simply categorizing a given series as convergent or divergent, we 
can now categorize it as either absolutely convergent, conditionally convergent or 
divergent. 

We have one more convergence test for series, which we put off until the present 
section because it shows not only convergence, but absolute convergence. This con- 
vergence test is convenient in that it does not require a second series for comparison, 
in contrast to the Comparison Test (Theorem 9.3.2) and the Limit Comparison Test 
(Theorem 9.3.4). On the other hand, we will see there are some useful series, such as 
the alternating harmonic series, that cannot be evaluated with this new convergence 
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test. The intuitive idea of this test is as follows. We already know about the conver- 
gence of geometric series, as discussed in Example 9.2.4 (4). A series )_) dy is a 
geometric series if _ is constant for all n € N. Suppose that a series )_; ay, is not 


Sul 


necessarily a georuiris series, so that is not necessarily constant, but suppose 


that “nth gets closer and closer to some “nudtalbet as n gets larger. That would mean 
that as n gets larger, the series becomes more and more similar to a geometric series, 
and so intuitively this series should behave similarly to a geometric series; hence 


lim n+. 


n—oo n 
is less than 1| or greater than 1. The following theorem confirms this intuitive idea, 
though for convenience we are able to put the absolute value inside the limit. 


the convergence or divergence of the series should depend upon whether 


Theorem 9.4.4 (Ratio Test). Let YP an be a series in R. Suppose that a, #0 


for alln € N, and that {| se: Snel is convergent or it diverges to infinity. Let L = 


n=1 


Gn+1 
an |" 


1. IfL € (0,1), then | dy is absolutely convergent. 
2. IfL € (1,) or L =, then Y_ dy is divergent. 


Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 9.4.8. 


(1) Suppose that L € [0,1). Let € = Le. ae € > 0. Hence there is some 
Su 


lan 


N €N such that n € N and n > N imply 


Lz < €. Suppose that n € N and 
n> WN. Then 


Gel <p 4 Gh = BE Letr = HE. Then |anyi| <rlanl. 
In particular, we deduce that |ay+1| < rjay|. Because it is also the case that 
, we deduce that |ay42| < r|ay|. It can then be proved by induction 


that |ay+%| < r*|ay| for all k € N; the details are left to the reader. It follows that 
lap| < lawl pp for all p € N such that p > N. 


Tan] lan| 


imal -1| < €, which implies that L—e < 4] <1 +, and hence 


Because L € {0,1), then r € [5 1). The series Y7"_, lay r" is a geometric series 


and it is convergent, as discussed in Example 9.2.4 (4). Becaiws 0 <|ap| < lanl r? for 


all p € N such that p > N, it follows from the Comparison Test (Theorem 93.2) that 
7-1 |an| is convergent, which means that °°_; a, is absolutely convergent. 


The reader will have noticed that the Ratio Test (Theorem 9.4.4) does not discuss 
the case where L = 1. We will see in Example 9.4.5 (2) below that it is not possible to 
predict convergence or divergence in that case. 


Example 9.4.5. 


(1) We will show that the series )7_, 5 os - is absolutely convergent. Clearly all the 
terms of the series are non-zero, so we can use the Ratio Test (Theorem 9.4.4). Using 
Example 8.2.4 (2) and Theorem 8.2.9 (3) we see that 
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gn+l 
: an+1 ‘ n+l)! . . 2 
L= lim = lim “ = lim |-| = lim -=0 
noo An noo — noo | nN noon 
ne 


It then follows from Part (1) of the Ratio Test that )°*° z is absolutely convergent. 


n=l yp 
co 


(2) We examine the convergence or divergence of the two series )'"_ i and 


Yrs +. For the former, we use Example 8.2.10 to see that 


A similar computation shows that the corresponding limit for the second series is also 
1; we omit the details. We now see why the Ratio Test does not treat the case L = 1, 
because the series )"_, ‘ is divergent by Example 9.2.4 (5), and the series )"_, a is 
convergent by Example 9.3.7. That is, both convergence and divergence can occur 
when L = 1. 

(3) In Part (2) of this example we saw that the Ratio Test failed to determine 
an+1 

qy 


convergence or divergence when lim = |. We now see a different way in 
n—0o 


which the Ratio Test can fail, which is when the limit lim does not exist. Let 


n—-oo 


an+1 
an 


Yr Cn be the series 


1 1 1 1 1 


R031 + 3232 7 9233 T page TF pags 


which is given by 
se if n is even 


Ch = : , 
if n is odd. 


I 
2-130? 
To apply the Ratio Test, we observe that 


;,  ifniseven 


Cnt+1 
Cn 


1 . . 
Tz, ifnis odd. 


Cn+1 
n 


By Exercise 8.2.2 we know that { } is divergent, and hence we cannot 
n=1 


use the Ratio Test to evaluate the convergence of "| cn. On the other hand, it is 
simple to see that the series 1; cn is convergent, by using the Comparison Test 
(Theorem 9.3.2) with the geometric series —. ©) 


fone} 
n= 


Although it might appear at first to the reader that the difference between abso- 
lutely convergent series and conditionally convergent series is merely a technicality, 
there is in fact a big difference between the behavior of these two types of series. We 
now see two topics that highlight just how very different these two types of series are. 

Our first topic concerns the multiplication of series. As was proved in Theo- 
rem 9.2.6 (1), the convergence of the sum of two convergent series works out nicely; 
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more precisely, we saw that PY?) (ay +bn) = VF an + LF) by Whenever Y?_ | ay 
and | b,; are convergent. By contrast, the situation for the product of convergent 
series is not as simple. 

What interests us about products of series is not the analog of Y"_) (an +bn), 
which would be )°7_, a,b, and which is not particularly useful to us, but rather the 
analog of PY? dy + V9 by, which is [LV | dyn]: (Y_| b,]. In particular, we would 
like to know whether there is a convenient way to compute the product of two series, 
because multiplying each term in one series by each term in the other series, as is 
done when we multiply two finite sums, would be tricky with series because of their 
infinite nature. 

As an example of multiplying finite sums, let us look at the product [ag + a +a9]- 
[bo + b + b2]. This product is expanded by multiplying every term in aj + a) + a2 
with every term in bo + bj + bo, resulting in 


[ao + a1 +49] - [bo +b) +b] 
= agbo + agbhy + agb2 + ay bo + a,b) + ay bz + azbo + azb\ + azb2. 


For convenience, we can rearrange the terms of this product by grouping elements by 
the sums of their subscripts, which yields 


[ao + a1 +42] - [bo + bi +b] 
= dgbo + (agb, +a bo) + (agbz + a,b, + aybo) + (a,b + ab) + ayb. 


We will use this idea of grouping elements by the sums of their subscripts when we 
treat products of series; the first three terms in the above equation are the typical ones, 
whereas the last two terms do not quite fit the pattern, because, in contrast to series, 
the sums ag + a} + a2 and bp +b; + b2 stopped at the subscript n = 2. 

For notational convenience, it will be simpler to consider the product of two series 
that start with n = 0 rather than n = 1, that is, series of the form )y_) dn. There is no 
loss of generality in having our series start with n = 0, because any series of the form 
Yr-1 dn Can be rewritten as YY dn4t- 

The following theorem shows that products of series work out exactly as expected 
when at least one of the series is absolutely convergent. We start with a definition. 


Definition 9.4.6. Let Yan and Y bn be series in IR. The Cauchy product 
of Pro dn and Yb, is the series )" en defined by en = Y¢_paxbn—x for all 
née NU {0}. A 


Observe that the Cauchy product of any two series in R is always defined, even if 
the original series are divergent. 


Theorem 9.4.7. Let Yr dn and 79 by be series in R. Let V9 en be the Cauchy 
product of 7-9 Gn and Yq by. Suppose that yp dy and V7. by are convergent, 
and that Yr-9 Gn or V7-¢ bn is absolutely convergent. Then V9 en is convergent and 


Speer an : [Ln—0 by] = Yr en- 
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Proof. Without loss of generality, assume that )°_¢ a, is absolutely convergent. 

Let {s,})° 9, and {t,}/° 9 and {u,};"_, be the sequences of partial sums of Y””_ dn, 
and Yrgbn and Yr_o en, respectively. Let A = Yr" ya, and B = Y_) by. Hence 
lim Sy, =A and jim tn = B. 

n—-' 

"Ke a pe iian wary step, let {hn }> 9 be defined by hy = tod a j[tn—; — B] for all 

n € NU {0}. We will show that ace is convergent and lim h, = 0. Let € > 0. Be- 
n—-oo 

cause im t, = B, then by Exercise 8.2.5 we know that {t, — B};"_, is convergent and 

jim (t, — B) = 0. It follows from Lemma 8.2.6 that {t — B};_) is bounded. Therefore 


there is some Q € R such that |t, — B| < Q for all k € NU {0}. We may assume that 
Q > 0. Because Y°*_, a, is absolutely convergent, then °°, |a,| is convergent. Let 
P=Y79|an|+1. Then P > 0. 

Because iim nt; = B, there is some N € N such that k € NU {0} and k > N imply 
lt; -—B| < . s Becanige Yj |an| is convergent, then by Exercise 9.2.11 there i is some 
M €N such that n,m € NU {0} and n > m > M imply |Y7_,,4; |a«l| < 4. Let 


J =max{N,M}, and let K = 2J. 
Suppose that n € NU {0} and n > K. Using Exercise 2.5.3, we see that 


An —0| = iltn—j 


n 
—B]| < Y aj] mj 
j=0 


J n 
=) lajl-le-~-Bl+ YS laj|-|e-y-Bl 


j=0 jJ=J+1 
py Wlte . lol< zp Ll j|+Q: a 


- op -—_=€, 
< - +Q 7 
We deduce that {h,}” 


n=1 


Let n € NU {0}. Then 


n= Der Yau c= Daa Dh c= Fate k 


j=0k=0 


— x ax|tn—k —B] 7 y AB = hy + syB. 
k=0 


is convergent and lim h, = 0. 
n—oo 


We know that lim s, =A and lim h, = 0, and therefore by Theorem 8.2.9 we 


deduce that {un}, is convergent and lim u, = lim ,,+ lim s,B=AB. We conclude 
n—-oo n oo nN co 


that °° _9 en is convergent and 9 én = [L7_o Gn: [LF 9 Dal: 


Example 9.4.8. Let Yio dn = Ln=0 On = Ln=o 7 47. This series can also be written 


as Pr aa and it is a geometric series, as discussed in Example 9.2.4 (4). By that 
example we know that )"_)a, and )"_.b, are convergent and 
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mM 
= 
M 
> 

_ 

| Jrone 
_ 


Because all of the terms of this series are positive, then it is absolutely convergent. 
Let Y° 9 e, be the Cauchy product of Y° 9a, and Y"_o by. Ifn € NU {0}, then 


n n 1 1 n 1 n+1 
en = d agbn—K = py DET 7(n-k)+1 dan ~ 7nt2" 


It now follows from Theorem 9.4.7 that Y9 én = Lipo a = Ln=1 qt is conver- 
gent and 


Loa Yen= Eo] [5 =1-1=1. 

n= n=0 n= 
It would have been straightforward to use the Ratio Test (Theorem 9.4.4) to show that 
Ln=1 gt 1s absolutely convergent, but that would not have told us what the sum of 


the series is. © 


It is not possible to strengthen the conclusion of Theorem 9.4.7 and say that the 
Cauchy product of Yd, and "9b, is absolutely convergent, because, as seen in 
Exercise 9.4.10, there are series )"_) a, and )°"9by such that Yay is absolutely 
convergent and )\_.b, is conditionally convergent, and the Cauchy product of 
Yo dn and Y\*_9 by is conditionally convergent. However, as seen in Exercise 9.4.11, 
if Pro @n and "9b, are both absolutely convergent, then the Cauchy product of 
Yo dn and Y_, by, is absolutely convergent. 

It is not possible to weaken the hypotheses of Theorem 9.4.7 and say only that 
Yo Gn and Y°*_¢ by are convergent, because, as seen in the first part of the following 
example, due to Augustin Louis Cauchy (1789-1857), there are series )}"" 9a, and 
Yo Yn Such that Pp dn and YP" _9 bn are conditionally convergent, and the Cauchy 
product of )*" 9 dn and Y"_9 bn is divergent. On the other hand, as seen in the second 
part of the example, the Cauchy product can be convergent (and in fact absolutely 
convergent) even if the series "9a, and 9 by are poorly behaved. 


Example 9.4.9. 


(1) Let Yr 9 Gn = Lop On = Lo (- 1)” oo This series can also be written as 
ye (-1)"" : wi and it is proved in Exercise 9.4.1 that this series is conditionally 


convergent. Let )"9e, be the Cauchy product of Yr 9a, and Yr 9 bn. Let n € 
NU {0}. Then 


é =e b 3 (—1)* a = = ( yy 1 
Bo By VERT Va & VEFDI@-H +7 
It is left to the reader to eny that (k+1)[(n—k) +1] < (n +1)? for all k € {0,...,n}. 


1 ae 
Hence (ENCE TE > — for all k € {0,...,n}. Therefore |en| > Ye_o aq = 1. 


It follows that {e, };° ) does not converge to 0, and by the Divergence Test an 
rem 9.2.5) we conclude that )°"_9 ey, is divergent; the fact that the series starts at n = 0 
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rather than n = | makes no difference for the Divergence Test. Hence, the conclusion 
of Theorem 9.4.7 does not hold for Yr") dn and Yr bn. 
(2) Let 


VY cn = 14142427423 4244---, 
n=0 


and 


Yi dy = -14+14141414--. 
n=0 


The series )\r 9 Cn and )°_9 dn are divergent. Let )"’ fn be the Cauchy product of 
Yr_o Cn and Y_9 dn (which is defined for any two series, convergent or not). Then 
fo =—1, and if n € NU {0} then 


fn = ¥ cedy_e = 1-141-142-1427 140-427-1421. (-1) 
k=0 


= gn-l _ 0, 


where the equality before last follows from Exercise 2.5.12 (3). It is evident that 
Yo fn 18 absolutely convergent. .) 


We saw in Example 9.4.9 (1) that if "9a, and "9b, are conditionally con- 
vergent, then it is not necessarily the case that the Cauchy product of 1") a, and 
L709 Yn is convergent. However, if "yd, and 9 by are conditionally convergent 
and the Cauchy product of Yan and Y 9 bn is convergent, then, as the reader 
is asked to show in Exercise 10.4.11, the sum of the Cauchy product must equal 
[Dn-0 2n] * [Ln—o Bn]: 

Example 9.4.9 (1) shows that conditionally convergent series are not as well 
behaved as absolutely convergent series, at least with respect to the Cauchy product 
of series. We now turn to an even more surprising difference between absolutely 
convergent series and conditionally convergent series, which is the issue of rearranging 
the terms of a series. Actually, the problem that we saw in regard to the Cauchy product 
of two conditionally convergent series is a special case of the more general issue 
of the rearrangement of series, because the way the Cauchy product of )""_9 an and 
Yo Yn works is by regrouping terms according to the sums of their subscripts, and 
that regrouping involves a rearrangement of the order of terms of the form ajb;. 

We start our discussion of rearrangements of series with the analogous question for 
finite sums. First, we need to remind ourselves what the sum of finitely many numbers, 
for example 3 + 7+ 2, means. In principle, addition is defined for only two numbers 
at a time. However, as we saw in Exercise 2.5.19, it is possible to use Definition by 
Recursion to define aj +42 +--:+ a, for any n € N and any aj,...,a, € R. Once we 
have that definition, it is possible to make use of the Commutative and Associative 
Laws for Addition to rearrange the order of any finite sum without changing the 
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value of the sum. For example, we see that 3+7+2 = (3+7)+2=(7+3)4+2= 
74+(34+2) =74+(24+3) =74+2+4+3 

Does rearranging the terms of a series, which is an infinite sum, change the value 
of the sum of the series? Of course, that question is only relevant to convergent series. 
The answer to this question is not at all obvious, because the definition of the sum 
of a series is very much dependent upon the order of the terms; if the terms of a 
series are rearranged, then the sequence of partial sums is changed. In fact, as seen in 
Example 9.4.11 below, the sum of a series can change if the terms of the series are 
rearranged. To state that example, we need the following definition. 


Definition 9.4.10. Let Ya, and V7"; bn be series in R. The series )7_, bn is 
a rearrangement of )"_, a, if there is a bijective function f: N — N such that 
bn = af(n) for alln € N. A 


A bijective function from a set to itself is often called a “permutation” of the set, 
though we will not use that terminology. 


Example 9.4.11. It was seen in Example 9.4.2 (2) that the alternating harmonic series 
Ye, (—1)""!! is conditionally convergent. Let § = Y°_, (—1)""! 7 Writing out the 


n n= 
terms of the series, we see that 


1 a 1 1 Ss 1 1 1 41 
2 3 4 5 6 7 8 
It follows from Theorem 9.2.6 (3) that 


1 1d Dau 11 te 
2 2 4°6 8 10 12° : 


Using Exercise 9.2.7 we see that 


S l 1 1 1 
Aaa made eae a 


1 1 1 1 
0 0 0 0 rey 
. Tip.” a” ra ie 
By adding the above series to the alternating harmonic series, and then using Theo- 
rem 9.2.6 (1), we deduce that 


38 111 { 1,4 
=] estes 
ae he ae a ae ee 


By the obvious variant of Exercise 9.2.7 we conclude that 


aS gad rea. Dads. 
; ea: ae aie, we ee” Oe : 


This last series is a rearrangement of the alternating harmonic series, where the 
pattern for the rearrangement is that after every two terms with odd denominators we 
have one term with an even denominator. Though in practice we rarely write out a 
rearrangement of a series using the formal definition given in Definition 9.4.10, if we 
wished to do so in the present case, we could denote the alternating harmonic series 
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by 1 4n, and then the rearrangement given above is the series )'7"_) 4f(n), Where 
f: N— Nis defined by 


4k—3, ifn=3k—2forsomek EN 
f(n)=4 4k-1, ifn=3k—1 forsomek EN 
2k if n = 3k for some k EN. 


9 


Finally, we see by Exercise 9.3.10 (2) that S = In2, and therefore S 4 0 by 
Theorem 7.2.3 (1) and Lemma 7.2.4. Hence a # S, which means that a rearrangement 
of a convergent series can be convergent and have a different sum from the original 
series. © 


The strange behavior seen in Example 9.4.11 occurred with a conditionally con- 
vergent series. The following theorem shows that no such problem can occur with an 
absolutely convergent series. 


Theorem 9.4.12. Let Vr, dn be a series in R. Suppose that Vy, ay is absolutely 
convergent. If Yr, bn is a rearrangement of V7) n, then V7) by is absolutely 
convergent and V"_) by = Vp) Gn. 


Proof. Suppose that )_, by; is a rearrangement of )°7; dn. First, we show that 
YL bn is absolutely convergent. Let {u,};_, and {v,};_, be the sequences of partial 
sums of )"_, |dn| and )*"_, |b, |, respectively. By hypothesis we know that {u,};"_, is 
convergent, and by Lemma 9.3.1 we therefore know that {u,};_, is bounded. Hence, 
there is some M € R such that |u,,| <M for alln EN. 

Let k € N. Because Y*_, by, is a rearrangement of )?_; dy, there is a bijective 
function f: N — N such that by = a(,) for all n € N. Let J = max{f(1),...,f(k)}- 
It follows that 0 < vg = |by| +--+ + [bel = lagay|+-+-+lagay| < lar] +--+ +]as| = 
uy <M. It follows that |v,| <M for all n € N, and therefore {v,};"_, is bounded. 
By Lemma 9.3.1 we deduce that Y""_, |b,| is convergent, and therefore )7_; bn is 
absolutely convergent. 

Next, we show that bn = D7 dn. Let {s,}7°_, and {t,}°_, be the sequences 
of partial sums of Yd, and YY", bn, respectively. Because Ya, and Yr; by 
are absolutely convergent, then )°"_, a, and )°"_, by are convergent by Theorem 9.4.3. 
Hence {s,},, and {t,};, are convergent. We will show that {t, —s,};_, is con- 
vergent and that tim, (ty — Sy) = 0. It will then follow from Theorem 8.2.9 (1) that 
bee bn = jim fn = iim [Sn + (th — Sn)| = baa an +0= Le an- 

Let € > 0. Because Y*"_, |a,| is convergent, then by Exercise 9.2.11 there is some 
N €N such that n,m € N andn > m > N imply | Yr_,, 41 |ax|| < €, which means that 
aes |ai| <é. 

Let P= max{f—!(1),...,f7!(N)} and Q = max{f(1),...,f(P)}. Then we see 
that {1,...,NV} C {f(1),..., f(P)} C {1,...,Q}. HenceN < P< Q. 

Suppose that m € N andm > Q+1. Then 


m m 


ye oi- ai 
j=l i=l 


Mm m 


y A F(j) y qj 
j=l i=l 


\(tin = Sm) = 0| = 
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[Eon + y af (j |- Yat 3 “i 


j=P+1 i=N+1 


Because {1,...,N} C {f(1),...,f(P)}, then in this last expression, each term of 
the form a; in the summation Y*_, a; will cancel out with a term of the form af(;) 
in the summation pe af(j): Some other terms might also cancel out, and what 
remains inside the absolute value after all possible canceling is a summation of the 
form YVjcy ai — View Gi, for some sets V,W C{N+1,...,Q+1} such that VOW =9. 
Using Exercise 2.5.3, we see that 


O+1 
(tm — Sm) =|) aj- yy qj < y |a;| Sg y |a;| < Y |a;| <E, 
iceV icw icV icW i=N+1 


where the last inequality holds because Q+ 1 > N. We conclude that {t,, — Sn}pt is 
convergent and lim (tf, —s,) =0. 
n—oo 


Comparing Theorem 9.4.12 and Example 9.4.11 shows that rearrangements of con- 
ditionally convergent series are not as well behaved as rearrangements of absolutely 
convergent series. In fact, as will be seen in Theorem 9.4.15 below, rearrangements of 
conditionally convergent series behave even worse than might be imagined from just 
the evidence of Example 9.4.11. We start with the following definition and lemma, 
which help clarify the difference between absolutely convergent and conditionally 
convergent series. 


Definition 9.4.13. Let a € R. The positive part of a, denoted a, and the negative 
part of a, denoted a, are defined by 


4 a, ifa>0 = 0, ifa>0 
—a, ifa<0. A 


Lemma 9.4.14. Let P| dy be a series in R. 


1. The series _, ay is absolutely convergent if and only if Y°_, ax andy, an 
are convergent, and if they are convergent then Y"_, dn = Vat —L_y an- 
2. If Y_ dn is conditionally convergent, then Y_, aj, = and Y"_, an =~. 


Pree. Observe that for ag ne N; we can evaluate a; and ay; by the equations 


at =} 5 \an| + 5an and a, = 5|@n| — 5an- Hence a, = aj — ay and |a,| = ay +az. 


(1) Suppose that °°; a, is absolutely convergent. Then )°_, |an| is convergent, 
and by Theorem 9.4.3 we know that )°"_, a, is convergent. Using the preliminary 
observation about an and a, , it follows from Theorem 9.2.6 that Yr, a; is con- 
vergent and er: at = 4 Yn 1lQn} +5 £ 5 Ln—14n, and that )7| ay is convergent and 
De = 5 lel —F ae Solving these two equations for | a, yields 
ee 14n = Yn= i ae 1c. 

Next, suppose that )°”° =1 i and )*_, a, are convergent. Again using the pre- 
liminary observation about a;* and a; and Theorem 9.2.6, we deduce that Ye lanl 
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co = 


is convergent and | |a,| = VY", af +YV_, an. Because Y*_, |a,| is convergent, 
then )\"_| dn is absolutely convergent. 


(2) Suppose that )°"_, dn is conditionally convergent. Suppose also that at least 
one of Y*_, ai = co and LY”, ay, = is false. We consider the case where )°_, af = 
co is false; the other case is similar, and we omit the details. Because a; > 0 for 
all n € N, then Exercise 9.3.1 (1) implies that 1°, aj is convergent. Using the 
preliminary observation about a; and az, we see that a, = aj} — a, for alln € N, and 
it follows from Theorem 9.2.6 that 1", ay is convergent. Part (1) of this lemma then 
implies that )_, dn is absolutely convergent, which is a contradiction. 


The following somewhat surprising theorem, due to Georg Friedrich Bernhard 
Riemann (1826-1866), shows that rearrangements of conditionally convergent series 
are as poorly behaved as possible. 


Theorem 9.4.15. Let Yr; ay, be a series in R. Suppose that Y_; ay is conditionally 
convergent. 


1. Letx €R. Then there is a rearrangement Y_, dn of Y,-4 An Such that V7) dn 
is convergent and Y_, dy =x. 
2. There is a rearrangement Vy) Cn Of Lp 1 An Such that Vr) Cn is divergent. 


Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 9.4.13. 


(1) If )_) ap has only finitely many non-zero terms, then it would be absolutely 
convergent, as the reader can verify. Hence )°"_, a, has infinitely many non-zero 
terms. By Exercise 9.4.6 we may remove all terms that are zero from )\*"_, a, without 
changing the convergence or divergence of )_; a, or any rearrangement of it. Hence, 
we may assume without loss of generality that the series )_, a, has no zero terms. 
By Lemma 9.4.14 (2), we know that )?_,; aj = cc and )_, a, =e. Hence, it 
must be the case that each of these two series has infinitely many non-zero terms. 
Let Y°_, by, and Y*_, cy be obtained from Y*_, a} and Y*_, an, respectively by 
removing all terms that are zero. By Exercise 9.4.6 again it follows that )"_, b, = 
and Yyr_| Cn = 0°. Each term of )\""_, dn is found in precisely one of Pr; bn or Vy Cn. 
We will construct the desired rearrangement of )°_, dn by arranging the combined 
terms of Yr, by and Yr; Cn, as described below. 

As a preliminary step, we define two sequences {p,},,_, and {gn}, _, in N, and 
two sequences {y,}/, and {z,}"_, in R, using joint Definition by Recursion, by 
which we mean that we will first define p;, g1, y; and z;, and we will then define all 
four of Peri. Ue+1+ Ye+1 and Zz41 in terms of px, gx, yx and zx, for all k € N. (This 
type of joint Definition by Recursion works because it is really a single Definition by 
Recursion in the set N x N x R x R.) For convenience, we will use the convention 
that any summation of the form pis with c > d is taken to be zero. 

First, because )_, b, =, there is some p € N such that yy b; > x. By the Well- 
Ordering Principle (Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6), we can find the 
smallest such natural number p, which we will call p;. Hence ee bicx< i 1, Bi. 
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Let y; = oy b;. It follows that yj — bp, <x < yj, and therefore 0 < yj -—x< PP 
Similarly, because )_; cn = 0, there is a smallest a € N such that her ci > yi 
Hence ye 1ci>yYi-x= rey! cj. Let 21] = y1 — Ye canes It follows that0 <x—-—z, < 
Cais 

o Second, suppose that we have defined px, qx, yx, and z, for some k € N, and that 
O<yp—x < bp, and 0 <x— zy <cg,. We define peri, ger1 € N and yp, 241 € R 
as follows. By Exercise 9.2.12 we see that Vip, 41 On = and Ynig 41cn = &. 


n= 
Then there is a smallest py.; € N such that pgy) > pe and oe we > xX — ZR. 


Hence Lae bj > X—Z% > es bj. Let yep1 = ze + Lies), bi. It follows that 


1=pet i=py+l 
O<yu1-x<b There is ato. a smallest gx., € N such that ggi1 > qx and 


Pk+i1° 
Ik dk q 
yaaa Ci > Yey1 — x. Hence pea Ci > Yep —X 2 Yi; saat cj. Let 2e41 = Yer — 

q, i 
L. ei cj. It follows that 0 <x—z41 < cq, - 

We have now defined sequences {py }7_1. {An}n—1> (Yn}n—1 and {z,},_1, such 
that 0 < y,—x < bp, and 0 <x—Z, <cy, for all n € N. Moreover, it is seen by 
the above construction that pp < Pn+1, and qn < gn+1 for all n € N, and it follows 
from the analog of Exercise 8.3.1 for strictly increasing sequences that {p,}/"_, and 
{qn},—1 are strictly increasing. 

Because )’""_; a, is convergent, we know by the Divergence Test (Theorem 9.2.5) 


that lim a, = 0. It follows from Exercise 8.2.13 (2) that im \a,| = 0. By the definition 


of {b,};_, and {c,}*, we see that both of these sequences are subsequences of 
{|an|};,_,. and we deduce from Lemma 8.3.7 that lim b, = 0 and lim c, = 0. Because 


1 Diet ot and {C4 ae are subsequences of {b, };"_, and {cn };_1, respectively, using 


Lemma 8. 3.7 again we ~eonelude that jim nN Dn = 0 and lim N Cg, = =0. 
n— 


The inequalities 0 < y, —x < bp, and ( 0 <x-Z < cy, for alln € N, together with 
Example 8.2.4 (1) and the Squeeze Theorem for Sequences (Theorem 8.2.12), now 
imply that lim (y, —x) =0 and lim (x—z,) = 0. By Exercise 8.2.5 we deduce that 

n-oo n—-eoo 


lim y, =x and lim z, =x. 


n—-eoo n—-eoo 


We now define a rearrangement )\r_, dn of Yr) dn by 


Yi dn = by +--+ + Bp, Cjr-c Cg, +Opy 41 tee + bp) — gy 41-38 egg $e 


n=1 


We will show that Y"_, d, is convergent and Y°_; d, = x. Let 1S tae , be the sequence 
of partial sums of )°_, dn. Because the terms in the sequences {b,}°_, and {cn}>_| 

are all positive, a look at the definition of the sequences {y,}_, and {zn}, jan 
that s; = b,, and that the values of s, are increasing until s,, = y,; the values of 
Sy are then decreasing until s,,4,, = z1; the values of s, are then increasing until 
Sp, +q)+p2 = Y2; the values of s, are then decreasing until 5),49,4p.+q. = 22, and 
so on. It follows that if mn = pi +qi+-:-+q, +i for some k € NU {0} and ie 
{1,..., px}, then zy_1 < 5» < yg (where for convenience we let p; +. 41+ pot+qo =0 
and zo = 0), and ifn = py +q1+---+pz+i for some k € NU {0} andi€ {1,..., ax}, 
then z, < sy < yg (where for convenience we let pj + qi + po = 0). Every s, falls 
precisely into one of these two cases, and we can put the two cases together to 
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deduce that min{z,_1,zx} < 5) < yx for all n © N. We saw above that lim y, =x 
and lim z, = x. It follows from Exercise 8.2.4 that lim er andl it -tollows 
fidin Eacreted 8.2.8 that {min{z,—1,Zn}})_1 is eouvenpentand tim, min{Z,—1,Zn} =x. 
Finally, the inequality min{z,_1,z¢} <5» <y, for alln € N, together with the Squeeze 
Theorem for Sequences (Theorem 8.2.12), imply that im S, = x, from which we 


conclude that ) dy = x. 


Reflections 


Other than the discussion of the Ratio Test, which is indispensable for working 
with power series, much of this section (for example, the discussion of products and 
rearrangements of series) is skipped in some introductory real analysis courses. Doing 
so is unfortunate, however, because this material contains the one truly surprising 
result, as well as the most substantial proof, in this chapter, namely, Theorem 9.4.15 
and its proof. It is often these surprises that make mathematics as interesting as it 
is—it would be boring to study any subject where everything was already known on 
an intuitive basis beforehand. Moreover, if everything that seemed intuitively true 
turned out to be indeed true, we could simply do everything on an intuitive basis and 
be sure we were correct. It is the existence of counterintuitive facts that requires us to 
proceed with utmost rigor. 


Exercises 


Exercise 9.4.1. [Used in Example 9.2.7 and Example 9.4.9.] Prove that the series 


ye, (-1)""! 7 is conditionally convergent. 


Exercise 9.4.2. [Used in Exercise 9.4.9 and Exercise 10.4.10.] Let )_, dn be a series 
in R. Prove that if Y*_, a, is absolutely convergent, then |)°?_) dn| < V7_) lanl. 


Exercise 9.4.3. Let )"; a, be a series in R. Prove that if | ad, is absolutely 
convergent, then 2°, (dn)* is convergent. 


Exercise 9.4.4. Find an example of a series ))"_, a, such that )°_; a, is convergent, 
but such that °°. | (an) is divergent. 


Exercise 9.4.5. [Used in Theorem 10.4.4.] Let a,r € R. Suppose that a £ 0. Prove 
that the series Y°_, nar”! is convergent if and only if |r| < 1. 


N= 


Exercise 9.4.6. [Used in Theorem 9.4.15.] Let ))_, b, be a series in IR. Suppose 
that )_, b, has infinitely many non-zero terms. Let )_; cy be the series obtained 
by removing all terms that are zero from )°7"_, bn. Prove that | by is absolutely 
convergent, conditionally convergent, divergent or diverges to infinity if and only if 
Yr-1 Cn is absolutely convergent, conditionally convergent, divergent or diverges to 
infinity, respectively. [Use Exercise 8.3.7.] 


Exercise 9.4.7. Let | a, be a series in R, and let {b,};_, be a sequence in R. 
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Prove that if); a, is absolutely convergent and {bn}nt is bounded, then P| a, by 
is absolutely convergent. 


Exercise 9.4.8. [Used in Theorem 9.4.4.] Prove Theorem 9.4.4 (2). 


Exercise 9.4.9. [Used in Exercise 9.5.9.] Let 7° a, be a series in R. Suppose that 
yr-1n is absolutely convergent, and that for each Q € N, the series )’"_, or is 
convergent and )""_, on = 0. Prove that a, = 0 for alln € N. 

To prove the result, suppose to the contrary that a, 4 0 for some n € N. By the 
Well-Ordering Principle (Theorem 1.2.10, Axiom 1.4.4 or Theorem 2.4.6), there 
is a smallest p € N such that a, # 0. Let € > 0. Prove that |ap| < €, and derive a 


contradiction. [Use Exercise 9.4.2.] 


Exercise 9.4.10. [Used in Section 9.4.] Find an example of series )7 9a, and Yr) by 
in R such that )°"_9 a, is absolutely convergent and )°7_,b, is conditionally conver- 
gent, and that the Cauchy product of 9a, and Y_¢ by is conditionally convergent. 


Exercise 9.4.11. [Used in Section 9.4.] Let "9a, and )y 9 bn be series in R. Let 
Lf _0 én be the Cauchy product of Py an and Yo bn. Suppose that Yd, and 
Yo On are absolutely convergent. Prove that 1" én is absolutely convergent. 
There is no need to construct a proof analogous to that of Theorem 9.4.7; in fact, 
make use of that theorem applied to Y""_9 |an| and Lo [bn]. [Use Exercise 2.5.3.] 


Exercise 9.4.12. It was seen in Example 9.4.2 (2) that the alternating harmonic series 
yy (=1)""! i is conditionally convergent. It follows from Theorem 9.4.15 (1) that 
there is a rearrangement of this series that has sum equal to 0. Using the method of 
the proof of that theorem, write out the first 15 terms of such a rearrangement. Is there 


a pattern to the rearrangement? It is not necessary to prove that the pattern holds. 


Exercise 9.4.13. [Used in Theorem 9.4.15.] Prove Theorem 9.4.15 (2). 
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Up till now in this chapter we have looked at series of numbers, for example 


: =e : = : “F 
1 2 3 
We now turn to series with a “variable” in them. There are many different types of 


such series, for example 


1 2 3 . : . 
-+— 7+ 5+4+::: and sinax+sin2ax+sin3ax+-::-. 
x 7 3 


Although both of the above types of series are useful (the former in complex analysis 
and the latter in physics, among other uses), we will restrict our attention to the 
simplest and most widely used form of series with a “variable,” which are power 
series, for example, the series 
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2 3 4 5 6 : 


Power series are a generalization of polynomials. 

As we stated when we defined polynomials in Section 2.5, there is actually no 
such thing as a “variable.” The symbol “x” that we used in the above series simply 
represents an element of R. We do not happen to know the numerical value of the 
element “x,” but that does not make it any more variable than any other letter such as 
“c” that represents an element in R. As such, the above series with “x” are also series 
with terms that are numbers. To avoid the issue of “variables,” we will view power 
series as functions, analogously to the definition of polynomials in Definition 2.5.10. 


Definition 9.5.1. Let A C R be a set, and let f: A — R be a function. The function 
f is a power series if there is some a € R and a sequence {c,},"_) in R such that 
Yo Cn(x — a)” is convergent for all x € A and f(x) =? 9¢n(x— a)” for all x € 

A 


We will often consider power series where a = 0, that is, power series of the form 
yo nx”. 

Although we defined power series as functions in Definition 9.5.1, in practice 
it is customary to define a power series simply by saying “let 1 9cn(x— a)" be a 
power series” without referring to the power series as a function, and in particular 
without stating a domain and a codomain. Fortunately, no real problem arises from 
this informal way of defining power series, and so for convenience we will use it, 
though with the following caveat. If we say “let 1° 9 c,(x— a)” be a power series,” 
then we are to think of such an expression as defining a function, where the domain 
is assumed to be the set A = {x € R | Y"_cn(x—a)” is convergent}, which makes 
sense when we realize that “x” is not a “variable,” and where the codomain is assumed 
to be R. We could also take any subset of A to be the domain, but we will not do so 
unless otherwise stated. 

We now examine a few examples of power series, each presented in the customary 
way stated above; for each of these power series we will find the domain. A very 
useful tool for finding the domain of a power series is the Ratio Test (Theorem 9.4.4), 
which was stated for series of numbers, but which holds for power series as well, 
because for each value of “x” in the domain, the power series is a series with terms 
that are numbers. 


Example 9.5.2. 


(1) We want to find the domain of the power series )°"_9.x”. This series has the 
form "9 ¢n(x—a)", where a = 0, and where c, = 1 for alln € NU {0}. Letx ER. 
The series )_,x” is a geometric series, as discussed in Example 9.2.4 (4), because the 
series has the form )"_, ar"!, where a = 1 and r =x. We saw in Example 9.2.4 (4) 
that a geometric series Y°_, ar”! is convergent if and only if —1 <r < 1, and that 


if —1<r<1 then )*_, ar"! = =“. Hence the power series 19x” is convergent 


co 


if and only if —1 <x <1, and if —1 <x <1 then Y>_9x" = ~4. The domain of 
Yo” is therefore the open interval (—1,1). 
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(2) Recall the definition of moors given in Example 2.5.12. We want to find 
the domain of the power series ))7 9 *;. Let x € IR. We will use the Ratio Test (Theo- 
rem 9.4.4) to evaluate whether or not the series )°7_ x ; is convergent. However, the 
hypotheses of the Ratio Test include the requirement that the terms of the series are 
never zero, SO we need to consider two cases. First, suppose that x = 0. Then the 
power series is ) 9 r , which is absolutely convergent. Second, suppose that x # 0. 
Using Example 8.2.4 (2), Exercise 8.2.4 and Theorem 8.2.9 (3), we see that 


af 


It follows from the Ratio Test that 7 9 x is absolutely convergent. Putting the two 


cases together, we deduce that the domain of 1", * x is R, and that the power series 
is absolutely convergent for all x € R. 

As a consequence of the above calculation, we use the Divergence Test (Theo- 
rem 9.2.5) to deduce that dim = 0 for all x € R. This fact is used in the proofs of 
Theorem 7.4.5 and Corollary 10. 4.15. Normally we use sequences to prove things 
about series, but in this instance it worked out nicely the gs way around. 

(3) We want to find the domain of the power series )_ (i: Let x € R. First, 


suppose that x = 3. Then the power series is sie 5 which is absolutely convergent. 
Second, suppose that x ~ 3. Using Example 8.2.10 and Theorem 8.2.9 (3), we see that 


(x—3)"t! 


Taset —3 —3 
L= lm Je Ten a! Laue I 
n—0o (x—3) no 4+ 1 5 5 
n5" 
It follows from the Ratio Test that Y_ — is absolutely convergent when ase sl <1, 


and is divergent when Pals, th is, the power series is absolutely nae 
when <x < 8, and is divergent when x < —2 and when x > 8. Hence the domain 


of Yr) ar contains the open interval (—2,8), and does not intersect the set 


(—e, —2) U(8,). 

It remains to be verified whether )7_, Cah is convergent or divergent for each 
of x = —2 and x = 8. The Ratio Test will not work for these two values of x, because 
they are the values of x that yield L = 1, which is precisely when the Ratio Test is 


inconclusive. We therefore examine the two cases x = 8 and x = —2 individually. 
3)" Pe alas, . : 
If x = 8, the power series o is P14 i which is the harmonic series, and 


which is divergent by Example 9.2.4 (5). If x = 2, the power series )'7_, ao is 


pe CI , which is —1 times the alternating harmonic series, and which is convergent 
by Beample 9.4.2 (2) and Theorem 8.2.9 (3). 
Putting the various cases together, we see that the domain of )_, oy i s [—2,8). 
(4) We want to find the domain of the power series ))*_, n!x". Let x € R. First, 
suppose that x = 0. Then the power series is )°"_, n!0”, which is absolutely convergent. 
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Second, suppose that x ~ 0. Using Exercise 8.2.15 and Exercise 8.2.17 (3), we see 
that 
(n+ Lie 


n!x” 


L= lim 


n—-oo 


= liman|x| =~. 


n—-oo 


If follows from the Ratio Test that), n!x” is divergent for all x € R— {0}. Therefore 
the domain of 1, n!x” is {O}. % 


We observe that in all of the parts of Example 9.5.2, the domain of the series was 
an interval in R, where we think of R as the interval (—c,c¢), and we think of {0} as 
the interval [0,0]. This observation turns out not to be coincidence. Rather remarkably, 
no matter how strange a power series might seem, its domain must be an interval, as 
we will prove in Theorem 9.5.4 below. We start with a lemma. 


Lemma 9.5.3. Let 9 Cnx" be a power series in R, and let p,q € R. If YP 9 eng" 
is convergent, and if |p| < |q|, then L>_9 cnp" is absolutely convergent. 


Proof. Suppose that Y*_) cng” is convergent, and that |p| < |g|. Observe that g 4 0. 

Because )”_9 cng” is convergent, then by Exercise 9.2.2 we know that {cnq"} 7 is 

bounded. Hence there is some M € R such that |c,g"| < M for all n € NU {0}. Then 

n n 
<M. fz 
q 


lcnp"| = Ieng” | fz 
q 


n 
for all n € NU {0}. The series YM - is a geometric series, as discussed in 


P 
q 


Example 9.2.4 (4), and by that exercise the series is convergent because | < 1. The 


Comparison Test (Theorem 9.3.2) now implies that )"_9|cnp"| is convergent, which 
means that )°"_c,p" is absolutely convergent. 


Theorem 9.5.4. Let )°_9c¢n(x—a)" be a power series in R. Then precisely one of 
the following holds: 


(1) the power series is absolutely convergent for all x € R; 

(2) the power series is convergent only for x = a, where it is absolutely convergent; 

(3) there is some R € (0,°°) such that the power series is absolutely convergent 
for all x € (a—R,a+R), and the power series is divergent for all x € R— 
[a—R,a+R). 


Proof. \t is evident that no two of the three options hold simultaneously. 

Suppose that Options (1) and (2) do not hold. We will show that Option (3) holds. 
First, suppose that a = 0. Hence the power series is )" _,c,x". Because Options (1) 
and (2) do not hold, there are r, p € R such that "cpr" is not absolutely convergent, 
and such that p £0 and Y°_9 cnp” is convergent. Let g = |r| + 1. Then |q| > |r|, and 
by Lemma 9.5.3 it must be the case that Yo c,q" is divergent. 

Let 


S= {: ER| » Cnx" is convergent}. 
n=0 
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Then p € Sand q ¢ S. Hence S 4 @. Let y € R. Suppose that |y| > |q|. It follows from 
Lemma 9.5.3 that )""_)cny" is divergent. Hence S C [—|q|,|q|], and therefore S is 
bounded. The Least Upper Bound Property implies that S has a least upper bound. Let 
R=lubS. Because |g| is an upper bound of S, then R < |g|. We know that 1" _y cpp” is 


n 
convergent, and hence it follows from Lemma 9.5.3 that YP ¢n ( ipl) is convergent. 


Therefore Bl € S. Because p # 0 we deduce that R > 0. 

Let x € (—R,R). Then |x| < R, and hence |x| is not an upper bound of S. Therefore 
there is some z € S such that |x| < z. Clearly z > 0, and hence |x| < |z|. By the 
definition of S we know ).9cnz" is convergent. By Lemma 9.5.3 it follows that 
Yo nx” is absolutely convergent. 

Now let y € R— [—R, R]. Then |y| > R. Suppose that °° 9 cny” is convergent. Let 
i= a Then 0 < R <t < |y]. It follows that |t| < |y|, and therefore by Lemma 9.5.3 
we see that )°"_9 cnt” is convergent. Hence t € S, which is a contradiction to the fact 
that R = lubS. We conclude that )°""_, cny” is divergent. 

We now turn to the general case, which involves the power series 1" cn(x— a)", 
where a is not necessarily zero. By the previous case, we see that there is some 
R € (0,0) such that this series is absolutely convergent when x—a € (—R,R), and the 
series is divergent when x—a € R—[—R,R). It follows immediately that the series 
is absolutely convergent for all x € (a—R,a+R), and the series is divergent for all 
x€R- |a—R,a+Ri. 


It is important to note that in Case (3) of Theorem 9.5.4, nothing is said about 
convergence at x = a— R and x = a+R. In fact, anything can happen at these points. 
That is, there are power series that converge precisely on (a — R,a+R), there are 
power series that converge precisely on (a—R,a+R], there are power series that 
converge precisely on [a — R,a + R) and there are power series that converge precisely 
on [a—R,a+R)]. What is remarkable in Theorem 9.5.4 is that in all possible cases, 
the set of real numbers for which the power series converges is precisely an interval 
of some form, as opposed to some more complicated type of subset of R, which leads 
to the following definition. 


Definition 9.5.5. Let 9 cn(x— a)" be a power series in R. The interval of con- 
vergence of the power series is the set {x € R | 2° 9cn(x—a)” is convergent}. The 
radius of convergence of the power series is defined as follows: 


(1) if the power series is absolutely convergent for all x € R, the radius of conver- 
gence is R = 09; 

(2) if the power series is convergent only for x = a, the radius of convergence is 
R=0; 

(3) if there is some R € (0,c°) such that the power series is absolutely convergent 
for all x € (a—R,a+R), and the power series is divergent for all x € R— 
[a — R,a +R), the radius of convergence is R. A 


For convenience, we will consider the symbol © to be greater than any real number, 
though co should not be thought of as a real number, but simply as a useful symbol. 
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Suppose that a power series has radius of convergence R. If we say that “R > 0,” 
we mean that R is a positive real number or R = ©; if we say that “R < o0,” we mean 
that R is a positive real number or R = 0. Observe that R > 0 if and only if the interval 
of convergence is non-degenerate. Also, if R =o, then we will write (a—R,a+R) to 
mean (—co, cc) = R, which allows us to avoid special cases. 


Example 9.5.6. We restate the results of Example 9.5.2 in terms of radius of con- 
vergence and interval of convergence. For Part (1) of that example, the radius of 
convergence is | and the interval of convergence is (—1,1); for Part (2), the radius of 
convergence is co and the interval of convergence is (—°e,°°); for Part (3), the radius 
of convergence is 5 and the interval of convergence is [—2,8); for Part (4), the radius 
of convergence is 0 and the interval of convergence is [0,0]. % 


Theorem 9.5.4 tells us that in principle the domain of any power series is an 
interval, but it does not tell us how to find the interval of convergence. In practice, the 
most common way to find the interval of convergence of a power series is what we 
did in Example 9.5.2 (3), which is to use the Ratio Test (Theorem 9.4.4) to find the 
radius of convergence, and then to check the endpoints of the potential interval of 
convergence individually. However, as seen in the following example, there are some 
power series for which the Ratio Test does not work, though such power series are 
not encountered very often in practice. 


Example 9.5.7. We want to find the radius of convergence and the interval of conver- 
gence of the power series 


x re x4 x” 
Lh og tat ae gg 


We can write this power series as )°_,c,x", where 
n=0 


nae if n is even 
if n is odd. 


Ch = I 
gn-1) 


Let x € R. If x = 0, then it is evident that "9 cnx” is convergent. Now suppose that 
x #0. If we attempt to use the Ratio Test (Theorem 9.4.4) to evaluate whether or not 
Y7_0 Cnx” is convergent, we would see that 


Gage _ J |x|, ifm is even 
Cut? fl if nis odd. 
Cepixttt 


Cy x" 


By Exercise 8.2.2 we know that { 


ing that x ~ 0, and hence we cannot use the Ratio Test to evaluate the convergence of 
0 C Po ole : 

Fortunately, it is still possible to determine where "9 c,x" is convergent, as 
follows. Let x € R. We first look at the series Y"_9 |¢nx”"| = V9 Cn|x|", to test for 
absolute convergence. Observe that ifn € NU {0}, then 


co 
} is divergent, because we are assum- 
n=1 
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|x|” |x|" 
i ak 


The series ))7 9 ft ~ and Yr_0 at are both geometric series with r = fl , as discussed 


in Example 9.2.4 (4), and therefore both of these series are convergent if and only 


if Bl < 1. Hence, these two series are convergent if and only if x € (—2,2). It then 
follows from the Comparison Test (Theorem 9.3.2) that the series Y° 9 |c,x”| is 
convergent if and only if x € (—2,2), which means that °° 9c,x” is absolutely 
convergent if and only if x € (—2,2). Theorem 9.5.4 then implies that the radius of 
convergence of )°7 9 Cpx" is 2. 

It remains to be verified whether )°7_, c,x” is convergent or divergent for each of 
x = —2 and x = 2. If x = 2, the power series is 


2 3 4 5 


2. 2. oo 
Nea riaay a ee ee a oe 


and this series is divergent by the Divergence Test (Theorem 9.2.5). If x = —2, the 
power Series is 


22 93 94 —95 


and again this series is divergent. 
We therefore see that the interval of convergence of )7_9 cnx” is (—2,2). © 


The calculation of the interval of convergence in Example 9.5.7 was based upon 
the Comparison Test (Theorem 9.3.2), which we could use because of the lucky (and 
rare) resemblance between the power series and certain geometric series. It would 
be nice to have a more systematic way of computing the interval of convergence of 
power series in those cases where the Ratio Test (Theorem 9.4.4) fails. A commonly 
used method for such calculations involves the Root Test, which in turn uses the 
notion of the limit superior of a sequence. We will not discuss either of these topics, 
because we will not otherwise need them. See [Sto01, Section 2.5] for limit superior, 
[Sto01, Section 7.1] for the Root Test in general and [Sto01, Section 8.7] for the use 
of the Root Test for finding the radius of convergence of power series. 

When working with polynomials, the reader has probably made use of the fact 
that the coefficients of a polynomial are unique; that is, if two polynomials are equal, 
then their coefficients are equal. We now state the analogous result for power series, 
as long as the power series have positive radius of convergence. 


Theorem 9.5.8. Let Yr ¢n(x — a)" and Y_9dy(x — a)" be power series in R. 
Suppose that each of V9 c¢,(x— a)" and V 9d,(x—a)" has a positive radius 
of convergence. Let I, and Iq be the intervals of convergence of Y° ¢Cn(x—a)" and 
Yo dn(x — a)", respectively. If V9 n(x — a)" = Vy dn(x— a)" for all x € Ie 01g, 
then Cn = dy for alln E NU{O}. 


Proof. Left to the reader in Exercise 9.5.9. 
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We conclude this section with two theorems about the convergence of sums, 
differences and products of power series. The first theorem follows immediately from 
Theorem 9.2.6, and we omit the proof. 


Theorem 9.5.9. Let °° 9¢n(x—a)" and Yo) dy(x—a)" be power series in R, let 
k € Rand letr EN. Let I, and Ig be the intervals of convergence of Y7-9 Cn(x — a)" 
and Y-_9d,(x—a)", respectively. 


1. YP 9 (Cn + dn) (x — a)" is convergent for all x € Ie Ig, and 


co co co 


y (Cn +dn)(x— a)" = y Cn(x—a)" + 3 dy(x—a)" 
n=0 n=0 n=0 
forallx €I,.NMg. 
2. ¥ 9 (Cn —dn)(x— a)” is convergent for all x € I, Ig, and 


y (Cn —dn)(x—a)" = Y ene a)” _ Yantra" 


forallx €1.0Ig. 
3. Yr ken(x—a)"*" is convergent for all x € I, and 


y kep(x— a)" =k(x—a)’ y Cn(x— a)" 
n=0 


n=0 
for allx €I,. 


Our next theorem, concerning products of power series, is less straightforward 
than Theorem 9.5.9. Intuitively, it would be reasonable to expect that multiplying 
power series is analogous to multiplying polynomials. For example, we see that 


[ap +ayx+ a2x"] -[bo + bix+ box”) 
= agbo + (agby +a, bo)x+ (agbz +a, by + aybo)x? + (ayb2 + ayb1)x?° + agbox*. 


The first three terms of this product are the typical ones, whereas the x° and x* terms 
have coefficients that do not quite fit the pattern, because the original polynomials 
stopped at x, in contrast to power series. The product of two power series works 
analogously, with one caveat. The issue is the difference between absolute convergence 
and conditional convergence. As we saw in Section 9.4, absolutely convergent series 
of numbers are better behaved than conditionally convergent series with such terms, 
and in particular the multiplication of absolutely convergent series is better behaved 
than the multiplication of conditionally convergent series. Hence, as we now see, the 
product of two power series behaves nicely if we restrict attention to the interval of 
convergence with the endpoints removed, to ensure absolute convergence. 

The proof of the following theorem appears simple, but that is because we did the 
hard work for it in the proof of Theorem 9.4.7. 
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Theorem 9.5.10. Let Yo ¢,(x— a)" and Y° ¢dy(x— a)" be power series in R. 
Let R. and Rq be the radii of convergence of "9 ¢n(x — a)" and 9 dy(x —a)", 
respectively. Let R= min{R.,Ra}. For each n © N, let en = Lio crdn—x. The 
power series Y°_9 €n(x—a)” is absolutely convergent for all x € (a—R,a+R), and 
[Dro Cn(x— a)"] - [Lp dn(x — a)"] = Lp en(x— a)" for all x € (a—R,a+R). 


Proof. If R = 0 then we are concerned only with convergence at x = a, and the 
result is trivial in that case. Now assume that R > 0. Let y € (a—R,a+R). Then 
Lo en(y — a)” and Y°_9d,(y—a)” are absolutely convergent. Because 29 cx (y — 
a)* «dn_x(y—a)"* = en(y—a)" for all n € NU {0}, it follows from Theorem 9.4.7 
that 19 en(y — a)” is convergent and that [P* 9 ¢cn(y—a)"] - [LF dn(y—s)"] = 
Ee pena)’. 

Because )*_9 €n(x — a)" is convergent for all x € (a—R,a+R), then the radius 
of convergence of °° 9 e,(x—a)" must be at least R. It follows that Y*_e,(x —a)” 
is absolutely convergent for all x € (a—R,a+R). 


We will see further results about power series, for example how to differentiate 
and integrate them, in Section 10.4; we have to wait until then because such facts 
about power series rely upon the notion of uniform convergence of series of functions, 
which is discussed in Section 10.3. 


Reflections 


This section on power series will probably leave the reader feeling unsatisfied— 
none of the really useful things one learns about power series, such as representing a 
function by its Taylor series, are touched upon in this section, and they are instead 
found in Section 10.4. That delay is due to the fact that in order to prove those 
useful results about power series, it will first be necessary to develop the notion of 
uniformly convergent sequences and series of functions; that topic will be discussed 
in Sections 10.2 and 10.3. However, in order to see which aspects of power series 
make use of uniform convergence and which do not, and in order to keep Section 10.4 
to a manageable size, we have included in the present section as much about power 
series as could be said without making use of uniform convergence. 


Exercises 


Exercise 9.5.1. Let "9 cnx" be a power series in R. Suppose that this power series 
is convergent for x = 9 and is divergent for x = —12. For each of the following 
statements, say whether it is true, false, or not determined by the given information. 


(1) The series is absolutely convergent for x = 7. 
(2) The series is convergent for x = —9. 
(3) The series is convergent for x = 10. 
(4) The series is convergent for x = 15. 


Exercise 9.5.2. Find the radius of convergence and interval of convergence of the 
following power series. 
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@ 34". 
(2) Deo 5. 
(3) D9 SP 


Exercise 9.5.3. [Used in Example 10.4.11.] Prove that the interval of convergence of 
n—-1 
the power series 7, oa) (x— 1)” is (0,2]. 


n 


Exercise 9.5.4. For each of the following intervals, either find an example of a power 
series that has the interval as its interval of convergence, or explain why the interval 
is not the interval of convergence of any power series. 


(1) (=3,3); 
(2) [3,11]. 
(3) [4,<). 


Exercise 9.5.5. Let 2" ¢n(x — a)” be a power series in R, and let r € N. Prove 
that Y°_9cn(x— a)" and Yo cn4+(x— a)" have the same interval of convergence. 
[Use Exercise 9.2.4.] 


Exercise 9.5.6. Let 2° 9 ¢n(x—a)" and Y°° _9d,(x—a)” be power series in R. Sup- 
pose that |c,| < |d,| for alln € NU {0}. 


(1) Prove that the radius of convergence of )°*_9d,(x — a)” is less than or equal 
to the radius of convergence of Y°_9¢n(x—a)". 

(2) Is it necessarily the case that the interval of convergence of )* 9dn(x— a)" is 
a subset of the interval of convergence of 19 ¢n(x — a)"? Give a proof or a 
counterexample. 

(3) Find an example of power series 179 b,x" and Y* 9 e,x” such that |b, | < 
le,| for all n € NU {0}, and that the two series have the same intervals of 
convergence. 


Exercise 9.5.7. [Used in Exercise 10.4.9.] In this exercise we use the sequence of 
Fibonacci numbers, denoted 14 which was defined in Example 8.4.10. 


(1) Prove that the radius of convergence of the power series )"9 Fn41x" is at 
least 5. [Use Exercise 8.4.12 (1).] 

(2) Let J be the interval of convergence of ))_) F,41x", and let f: J — R be de- 
fined by f(x) = Yo Fn41x" for all x € J. Prove that f(x) —xf (x) —x? f(x) =1 
for allx € J. 


Exercise 9.5.8. Let 2° 9 c,(x—a)” be a power series in R. Suppose that the sequence 
{cn},,_ is bounded. Prove that )""_9 cn(x—a)" has radius of convergence at least 1. 


Exercise 9.5.9. [Used in Theorem 9.5.8.] Prove Theorem 9.5.8. | [Use Exercise 9.4.9. ] 


9.6 Historical Remarks 


The reader might view the study of series as separate from the core ideas of calculus 
such as derivatives and integrals, and it might therefore appear as if the history of 
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series might be separate from the historical remarks seen in previous chapters. In fact, 
derivatives and integrals involve infinite processes (via infinitesimals or limits), as do 
series (which are infinite sums), and hence, as the reader will see, the history of series 
involves many of the same mathematicians encountered in the history of derivatives 
and integrals. 


Ancient World 


Arithmetic and geometric series were known in India in the 4th century BCE, and 
they were further explored in the period 300-1350 AD. 

Although Euclid (c. 325—c. 265 BCE) did not consider infinite series, there is a 
formula (expressed in words) for the partial sums of geometric series in Proposition 35 
of Book IX of the Elements. In modern notation, if aj + a2+a3+--- is a geometric 
series, Euclid’s formula is aL = oe where s, is the nh partial sum; this 
formula is equivalent to the formula for the partial sum given in Example 9.2.4 (4). 
Also, Proposition | of Book X of the Elements, which is the basis for the method of 
exhaustion, is essentially a fact about certain partial sums of series becoming as close 
as desired to a given number, though Euclid did not think of it in that way. 

Archimedes (287-212 BCE), in the Quadrature of the Parabola, showed a geo- 
metric equivalent of the sum )"_9 un = 4; this sum of a geometric series might be 
considered the first sum of a series in Europe, though Archimedes did not think in 
terms of infinite sums. 


Medieval Period 


The scholars at Merton College at Oxford in the 14th century were led from their 
philosophical interests to various infinite series. One of them, Richard Swineshead, 
also known as Suiseth and Calculator, argued in Liber calculationum of around 1350 
that )-1 37 = 2 (which is proved in Example 9.4.8, though with both sides of the 
equation divided by 2). Swineshead, whose approach was verbal and not rigorous, 
viewed this problem in terms of a body with uniform motion for half a period of time, 
then twice the velocity for the next quarter period, then three times the velocity for an 
eighth period, ad infinitum, and he argued that the total distance traveled would be four 
times the distance covered in the first half period. This sum of a series appears to be 
the first sum of a series in Europe since the geometric series summed by Archimedes 
mentioned above. 

Nicole Oresme (1323-1382), around 1350, discussed geometric series, and also 
proved that the harmonic series is divergent by the proof commonly used today (which 
is given in Example 9.2.4 (5)). Oresme gave a clever picture proof of the sum of twice 
Swineshead’s series, making use of geometric series. The methods of Swineshead and 
Oresme were further developed in the 15th and 16th centuries, though the significance 
of this study of series was not the particular results about series that were obtained, 
but the progress in the gradual acceptance of infinite processes by mathematicians. 
Series became important in mathematics only from the 17th century, as a way to avoid 
the method of exhaustion in area and volume calculations. 
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Renaissance 


Frangois Viete (1540-1603) showed an understanding of geometric series, as part of 
his work on the area of the circle in 1593. For example, he computed )°*" 5 i = i 
and showed that this sum was related to Archimedes’ geometric computation of the 
area of the parabola. Viéte had the intuitive idea of the convergence of series being 


based upon the partial sums. 
Seventeenth Century 


A good early account of the summation of geometric series is due to Grégoire de 
Saint-Vincent (1584-1667) in Opus geometricum quadraturae circuli et sectionum 
coni of 1647. He had the intuitive idea of the convergence of series being based upon 
the partial sums, he used geometric series to provide a method for finding the areas of 
conics, and he used such series to resolve Zeno’s Achilles paradox. Leibniz studied 
this work at the recommendation of Huygens. 

Evangelista Torricelli (1608-1647) made use of geometric series in area problems. 
He proved, geometrically, that if S = a+ ar+ ar? +---, then “ = —_, which is 


equivalent to our familiar formula S = 7%, assuming that a 0. ~~ 

Pietro Mengoli (1626-1686), a student of Cavalieri who was influenced by 
Grégoire de Saint-Vincent and Torricelli, contributed to the development of series 
in 1650. He had two axioms about series, which stated in modern terms are: (1) if 
Yn =, then for any x > 0, there is some N € N such that Y_, a, > x; and (2) 
if a series with positive terms is convergent, then any rearrangement is convergent, 
and has the same sum as the original series (which is proved in Theorem 9.4.12). 
Mengoli deduced, among other things, that if the sequence of partial sums of a series 
with positive terms is bounded then the series is convergent (which is proved in Lem- 
ma 9.3.1). Mengoli obtained sums such as °°" aEET) =land yr, STEN Cee = i 
He also gave a different proof than Oresme that the harmonic series is divergent (this 
proof is given in Exercise 9.2.9). 

John Wallis (1616-1703), in Arithmetica infinitorum of 1656, used clever (though 
far from rigorous, even to his contemporaries) analogical thinking to derive an infinite 
product formula for 5. His method used Pascal’s triangle, and hence the binomial 
coefficients, and this work helped inspire Newton to discover the binomial series, 
which he then used in his version of calculus. In Mathesis universalis of 1657, Wallis 
attempted to arithmetize Euclid’s Elements. For example, he wrote the first n terms 
of a geometric series in the way we write them today as A,AR,...,AR”~!, and he 
proved that the sum of these terms is vRA where V = AR"~!. Frans van Schooten 
(1615-1660) published another algebraic treatment of geometric series in 1657. 


Newton and Leibniz 


Prior to their work on calculus, both Isaac Newton (1643-1727) and Gottfried von 
Leibniz (1646-1716), whose approaches were quite different, first looked at some 
questions about series; Newton looked at the binomial series, which is an example of 
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power series, whereas Leibniz initially looked at series of numbers rather than power 
series. 

Leibniz’s first accomplishment in mathematics was, in answer to a question 
suggested by Huygen’s, showing that ; + ; + é + tb feet i. +--+=2 (which is 


proved in Example 9.2.4 (3)). Leibniz’ approach was to look at the series of differences 
of consecutive terms. It was pointed out to Leibniz that this result was essentially 
known already, though he had not been aware of that fact. We might speculate that if 
Leibniz had been aware that this series was known, then he would presumably not 
have looked at it, and then he might not have started thinking about the relation of 
sums and differences, and it was this relation that inspired his work on derivatives and 
integrals, which he viewed as a kind of difference and sum, respectively. In general, 
Leibniz handled series in a formal way, and was willing to work with divergent 
series, though he managed to obtain some correct results when using such series. 
Leibniz formulated and proved the Alternating Series Test, including the part about 
the difference between the partial sums and the limit, in an unpublished work of 
around 1676, and more completely in a letter to Johann Bernoulii in 1713. 


Eighteenth Century 


During the 17th and 18th centuries it was assumed that whatever held for finite 
sums also held for infinite sums. For example, because ee (an +bn) = pea an + 
ae b, for any finite sums, it was assumed that PY, (dn + On) = YP dn $V On 
automatically held for series. Similarly, because the terms of a finite sum can be 
rearranged without changing the value of the sum, it was assumed that the same held 
for series, though we now know that that is not true for conditionally convergent series 
(as seen in a special case in Example 9.4.11, and more generally in Theorem 9.4.15). 

In the 18th century, the approach to series, and more generally to real analysis, 
changed in the direction of separating analysis from geometry. The approach to series 
became more formal, and questions of convergence were considered for the purpose 
of specific applications, but not in the derivation of general facts about series. As such, 
divergent series were allowed, though sometimes with strange results. On the one 
hand, Jakob Bernoulli (1654-1705) noted in 1696 that 1—1+1—1+1-—--- had no 
sum. On the other hand, Daniel Bernoulli (1700-1782) said in 1724 that substituting 


x = | into the expression iy =1—x4+x?—2x3 4+... yielded 1—1+1-—14+1----= 5 
ignoring the interval of convergence. He also said that 1+2+4+8+16+---=-—1l 


by arguing that if S=1+2+4+8+16+---,thenS—1=2+4+8+16+---,so 
a =142+4+4+4+8-+416+---, which meant that Sol S, which implied that S = —1; 
the flaw, of course, is the assumption that there is such a number S. Such issues were 
debated during this period. 

However, even if some strange special cases were deduced using the formal 
approach to series, such special cases were not considered sufficient to invalidate the 
general results, which did not seem to be viewed as applicable to all cases. Indeed, the 
formal approach allowed for the discovery of some very useful results, for example 
the theory of generating functions of Pierre-Simon Laplace (1749-1827). Power 
series, and also trigonometric series, were employed in this period in the solution of 
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differential equations. Some of the mathematicians who helped develop series in the 
mid-18th century were Abraham de Moivre (1667-1754), Daniel Bernoulli, James 
Stirling (1692-1770) and Colin Maclaurin (1698-1746). 

The work of Leonhard Euler (1707-1783) on series was very influential. Among 
many other results, Euler proved that the sum of the alternating harmonic series is 
In2 (which is proved in Exercise 9.3.10), and he stated and used, in informal terms, 
the idea that if a series )’"_, a, with positive terms is convergent, then ye. pan Can 
be made as small as desired for large enough p and all k > p (which is proved in 
Exercise 9.2.11). Similarly to his predecessors, Euler thought that series could be 
properly manipulated even if divergent. 


Nineteenth Century 


The formal approach to series of the 18th century was replaced early in the 19th 
century with an approach to series that avoided formal manipulation, and instead 
recognized the importance of convergence (especially for power series and other 
series of functions). 

An early work that involved convergence was by Anastacio da Cunha (1744-1787) 
in 1782, who gave a somewhat imprecise statement that seems to be similar to the idea 
a Series is convergent if its sequence of partial sums is a Cauchy sequence. However, 
da Cunha worked in isolation in Portugal, and his ideas were not widely noticed. 

Joseph Fourier (1768-1830) helped bring attention to the importance of the 
convergence of series via his work on trigonometric series in Théorie analytique 
de la chaleur, submitted in 1807 but published only in 1822. Fourier used what we 
now call Fourier series to solve the partial differential equation that describes the 
propagation of heat, and he was very careful to specify for which values of x his series 
of functions represented the original function. The first person to publish a paper with 
this new approach to series was Carl Friedrich Gauss (1777-1855) in 1812, who said 
that one should restrict attention to where a power series converged. Gauss’ idea of 
when )°*_9 ¢n(x— a)” converged was when jim Cn = 0, which we now know is not the 


correct approach, but nonetheless he correctly asserted that the issue of convergence 
was important. Bernard Bolzano (1781-1848) took this new approach in 1816, but 
similarly to his other ideas, his work on series was not widely known and did not 
influence subsequent developments. 

The first systematic study of the convergence of series, and the first systematic 
exposition of this new approach, were due to Augustin Louis Cauchy (1789-1857) in 
the textbooks Cours d’analyse a l’Ecole Royal Polytechnique of 1821 and Résumé 
des legons a l’Ecole Royal Polytechnique of 1823. Cauchy, who stressed the need for 
rigor in real analysis, explicitly stated that divergent series have no sums, and that 
power series should be used only on their intervals of convergence. Cauchy proved 
that power series converge precisely on intervals, and that the power series expansion 
of a function is unique (by a proof similar to Euler). Cauchy separated the question of 
convergence of series from finding their sums if convergent. Cauchy’s definition of 
the convergence of a series (of numbers) was exactly the one we use today, though 
his definition of limit for the sequence of partial sums, though intuitively correct, 
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was still informal. Cauchy used his definition to show, for example, that )°""_)x” is 
convergent if and only if |x| < 1. Cauchy asserted that a series is convergent if and 
only if, in modern terms, its sequence of partial sums is a Cauchy sequence; he did 
not seem to realize that he needed the Cauchy Completeness Theorem to prove this 
equivalence. Cauchy proved a number of results that had, in the 18th century, been 
thought true simply because they were the analogs for series of facts about finite sums, 
for example the fact that if PY", a, and | b, are convergent, then )?"_, (dn + Dn) 
is convergent and | (dn +n) = Lp_1 an + Lj] bn. Similarly to Gauss, Cauchy 
stressed the importance of convergence tests, because he felt that one needed to know 
if a series was convergent before using it, and he proved, among others, the Ratio Test 
(which had been used by Gauss), the Root Test, the Alternating Series Test and the 
Integral Test. 


10 


Sequences and Series of Functions 


10.1 Introduction 


In this chapter, our final one, we bring together a number of ideas that we saw 
in previous chapters. We learned about sequences of numbers in Chapter 8, and 
series of numbers in Chapter 9. In the present chapter we study sequences and series 
of functions, which we will then use to further our study of power series (which 
was commenced in Section 9.5), and finally to construct a continuous but nowhere 
differentiable function, a fitting point at which to conclude our study of introductory 
real analysis. 

Whereas our primary interest here is in series of functions, of which power 
series are an example, our main technicalities, which concern the notion of uniform 
convergence, are to be found in Section 10.2, which treats sequences of functions. 


10.2 Sequences of Functions 


In Section 8.2 we saw the notion of a sequence of numbers, for example {3"}7"_,. We 
now turn to the analogous notion of a sequence of functions, for example the sequence 
{x"}°"_,, where for each n € N we think of x” as a shorthand notation for the function 
fn: R= R defined by f,(x) =x" forx € R. 

Although we saw some of the basic properties of power series in Section 9.5, there 
are some substantial results about such series that we did not prove in that section, 
due to the lack of some important tools. In order to prove these results about power 
series, we need to change the way we view them. Consider, for example, the power 
series 

Ltxtx te txtt rte. 


In our original discussion of power series in Section 9.5, we thought of such a power 
series as a collection of series of numbers, one series for each value of x. Now, by 
contrast, we want to view this power series as a single series, the terms of which form 
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the sequence of functions {x"}*_,. This shift in point of view might seem minor at 
first, but it will turn out to be very important. 

As was the case for series of numbers, in order to understand series of functions, 
we first need to learn about sequences of functions. 


Definition 10.2.1. Let A C R be a non-empty set. Let ¥(A,R) denote the set of all 
functions A > R. A sequence of functions A — R is a function F: N > ¥(A,R). 
If F: N — ¥(A,R) is a sequence of functions, and if f; = F (i) for all i € N, then 
we write {f,: A— R}-_,, or {fn}, when there is no ambiguity, to denote the 
sequence of functions. Each function f,, where n €N, is called a term of the sequence 


ee A 


Suppose that {f,,};"_, is a sequence of functions A > R. What would it mean to 
say that {f,}"°_, converges to a function f: A — R? In contrast to the situation for 
sequences of numbers, where there is only one plausible notion of convergence, it 
turns out that for sequences of functions there is more than one possible way to define 
convergence, and the different definitions are not equivalent. We will discuss two 
approaches, called pointwise convergence and uniform convergence. The former type 
of convergence is the most simple definition of convergence, but the latter type of 
convergence is much more nicely behaved. 

We start our discussion with pointwise convergence, where we simply examine 
the convergence of the sequence { f;,(x)}*"_; for each x € A separately. 


Definition 10.2.2. Let A C R be a non-empty set, let {f,};, be a sequence of 
functions A — R and let f: A — R be a function. The sequence of functions {fi }> 
converges pointwise to f if lim f,(x) = f(x) for all x € A. If {fr}, converges 
pointwise to some function A — R, we say that { f, }°_, is pointwise convergent. A 


Before seeing some examples of pointwise convergence, we need the following 
lemma, which shows, as expected, that if a sequence of functions converges pointwise 
to a function, then that function is unique. 


Lemma 10.2.3. Let A C R be a non-empty set, and let {fy}, be a sequence of 
functions A — R. If {fn}; converges pointwise to f for some function f: A — R, 
then f is unique. 


Proof. Suppose that {f, };_, converges pointwise to f for some function f: A > R. 
Let x € A. Then lim f,(x) = f(x). By Lemma 8.2.3 we know that lim f;,(x) is unique. 
n—0o n—+00 


Hence f is unique. 


Example 10.2.4. 


(1) For each n EN, let f,: [—1,1] — R be defined by f,(x) = 1-— = for all 

x € {[—1,1]. Let f: [-1,1] - R be defined by f(x) = 1 for all x € [-1,1]. Let ye 

(—1, 1]. Using Example 8.2.4 (1) (2), and Theorem 8.2.9, we see that lim f,(y) = 
n—-eoo 


lim (1 — ¥) =1= f(y). It follows that {f,}>°_, converges pointwise to f. 


n—-oo 
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(2) For eachn EN, let g,: [0,1] — R be defined by g,(x) = x" for all x € [0,1]. 
Let g: [0,1] — R be defined by 


_ fo, ifxe (0,1) 
ao) x= 1. 


Let z € [0,1]. Then tim gn (z) = tim 2”. It follows from Example 8.2.13 that if z € [0, 1) 
then tim gn(z) = 0, and that if z = 1 then tim gn(z) = 1. Hence tim gn(z) = g(z). It 
follows that {g,}/_, converges pointwise to g. 

For each n €N, the function g, is differentiable by Exercise 4.3.5 and Exer- 
cise 4.2.3 (5), and therefore g,, is continuous by Theorem 4.2.4. However, the func- 
tion g is neither differentiable nor continuous. Hence, a sequence of continuous 
functions can converge pointwise to a discontinuous function, and a sequence of 
differentiable functions can converge pointwise to a non-differentiable function. An- 
other way of expressing this problem with respect to continuity is by noting that 

lim tim 8n(x) # tim lim 8n(x). In general in real analysis, it cannot always be as- 
sumed that the order of limits can be interchanged, unless there is a specific theorem 
that justifies doing so in the given situation. (Another situation involving sequences 
of functions in which limits cannot be interchanged is seen in Exercise 10.2.10.) 

(3) For eachn €N, let h,: [0,1] — R be defined by 


1, ifxeQn0,1] andx=¢ 
hy (x) = for some a € NU {0} and b€ {1,...,n} 
0, otherwise. 


Let h: (0, 1] — R be defined by 
h( ) > 1 [ ’ ] 


0, otherwise. 


The reader is asked in Exercise 10.2.1 (1) to prove that {h, }*"_; converges pointwise 
to h. 

Let n €N. There are only finitely many numbers x € [0,1] Q such that x = ¢ 
for some a € NU {0} and b € {1,...,n}. It follows that h, is zero except at finitely 
many points, and hence h,, is integrable by Exercise 5.3.3 (2). On the other hand, the 
function h is not integrable, as was seen in Example 5.2.6 (3). Hence, a sequence of 
integrable functions can converge pointwise to a non-integrable function. 

(4) For each n € N such that n > 2, let p,: [0,1] — R be defined by 


nx, if x € [0,+) 
pile) =< 2n—ix, if ee (4, 2) 
0, if x € [2,1]. 


See Figure 10.2.1 for the graph of py. Let p: [0,1] — R be defined by p(x) = 0 for all 
x € [0,1]. The reader is asked in Exercise 10.2.1 (2) to prove that {p,};"_, converges 
pointwise to p. 
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Fig. 10.2.1. 


For each n €N, it can be verified that the function p, is continuous, and hence 
it is integrable by Theorem 5.4.11, and that i, Pn(x)dx = 1; the details are left to 
the reader. We know by Example 5.2.6 (1) that p is integrable and bb p(x)dx = 0. 
Hence, even though {p,};_, converges pointwise to p, the sequence of numbers 


{ fo Pal) dx} does not converge to i p(x) dx, which can be written as 


1 I 

lim | py(x)dx 4 | lim p,(x) dx. o) 
n—-oo 0 0 n—-oo 

As we saw in Example 10.2.4 (2)-(4), pointwise convergence of sequences of 
functions does not always behave nicely with respect to fundamental properties 
such as continuity, differentiability and integrability. The problem, it turns out, is 
not that continuity, differentiability and integrability are too tricky for convergence 
of sequences of functions to handle, but rather that pointwise convergence is not 
necessarily the best way to define convergence of sequences of functions. 

To obtain a closer look at pointwise convergence, we can express the definition of 
pointwise convergence using logical symbols as 


(Vx € A) (Ve > 0)(AN EN)[(n €NAn>N) = |f,(x) —f(a)| <€. 


As always, the order of the quantifiers is crucial. Because we are first given x and 
€, and we then show that there exists an appropriate N, the choice of N can depend 
upon both x and €. Intuitively, the reason the choice of N might depend upon x as well 
as € is because the sequence { f,(x)}*°_, might converge to f(x) “faster” for some 
values of x and “slower” for some other values of x. It turns out that it is precisely 
this difference in the rate of convergence (speaking informally) for different x that 
leads to the problems with the relation between pointwise convergence and continuity, 
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differentiability and integrability. As we will see below, we can avoid this sort of 
problem if we consider a different type of convergence of sequences of functions, in 
which for a given €, the same N works for all x. In logical symbols, we want this new 
type of convergence to be defined by 


(Ve > 0)(AN EN)(Vx € A)[(n €ENAn > N) = |fr(x) —f(a)| <€]. 


We now state this definition properly. 


Definition 10.2.5. Let A C R be a non-empty set, let {f,};, be a sequence of 
functions A — R and let f: A — R be a function. The sequence of functions {f,}7 
converges uniformly to f for each € > 0, there is some N € N such that n € N and 
n> N imply |fn(x) — f(x)| < € for all x € A. If {f, }_, converges uniformly to some 
function A — R, we say that {f,}/_, is uniformly convergent. A 


The intuitive idea of a sequence of functions {f,,}>"_, converging uniformly to a 
function f is that for any € > 0, if 1 is sufficiently large then the graph of f,, will be 
within a band that is distance € above and below the graph of f; see Figure 10.2.2, 
where the solid line is the graph of f, the dashed lines indicate the edges of the band 
that is distance € above and below the graph of f, and the dotted line is the graph of 


Sn: 


Fig. 10.2.2. 


The first of the following two lemmas is derived immediately from the definition 
of pointwise convergence and uniform convergence, and the second is derived from 
the first together with Lemma 10.2.3; we omit the details. 


Lemma 10.2.6. Let A C R be anon-empty set, let { f,};,_, be a sequence of functions 
A — Rand let f: A— R be a function. If {f,};_, converges uniformly to f, then 
{fn};,—1 converges pointwise to f. 


Lemma 10.2.7. Let A C R be a non-empty set, and let {fy}, be a sequence of 
functions A — R. If {fn}; converges uniformly to f for some function f: A — R, 
then f is unique. 
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We now see in the second part of the following example that pointwise conver- 
gence does not necessarily imply uniform convergence. 


Example 10.2.8. 


(1) For each n EN, let k,: [—1,1] — R be defined by k,(x) = x for all x € 
[—1, 1]. Let k: [—1,1] — R be defined by k(x) = 0 for all x € [0,1]. We will prove 
that {k,}""_, converges uniformly to k. Let € > 0. By Corollary 2.6.8 (2) there is 
some N € N such that n < €. Suppose that n € N andn > N. Let x € [—1, 1]. Then 
kn(x) k(x) =|“ 0] < Ml cle. 

Although we said that uniform convergence is supposed to work better than 
pointwise convergence with respect to continuity, differentiability and integrability, 
and indeed we will see that that is true, even uniform continuity is not perfect when it 
comes to differentiability. 

Observe that k, is differentiable for each n € N, and that k is differentiable. 
However, if n € N, then k/(x) = x""! for all x € [0,1], where we use one-sided 
derivatives at the endpoints of the closed interval. Hence iim KM(1)= jim cae 


0 = k’(1). Moreover, using Example 8.2.4 (4) we see that jim k,(—1) = jim, (-1)""! 
does not exist. Hence, even though {k,}/°_, converges uniformly to k, the sequence 
{ki,}/_, does not converge pointwise to k’. 

(2) Let {gn};_, and g be as defined in Example 10.2.4 (2). It was seen in that 
example that {g,}>_, converges pointwise to g. We will now prove that {g,};_, does 
not converge uniformly to g, and hence we will see that pointwise convergence does 
not necessarily imply uniform convergence. 

To prove that {g,};"_, does not converge uniformly to g we need to find some 
€ > 0 such that for each N EN, there is some x € [0, 1] and there is some n € N such 
that n > N and |g,(x) — g(x)| > €. It is sufficient to find some € > 0 such that for 
each n € N, there is some x € [0,1] such that |g,,(x) — g(x)| > € (this statement is a 
bit stronger than is needed, but it is simpler, and we can show it in the present case). 

Let € = 5 Letn € N. Letx = Ws see Exercise 3.5.6 for the existence of the n 
root of positive real numbers. It follows from Theorem 7.2.12 (1), Exercise 7.2.11 (1) 
and Theorem 7.2.13 (2) that 1 = 1" <2n = W/2. Hence x € (0,1). Then |gy(x) — 
g(x)| = l(a)" —0| = 5 > €. It follows that {g,};°_, does not converge uniformly to 


Graphically, the fact that {g,,}*"_, does not converge uniformly to g is seen in 
Figure 10.2.3, where the graphs of go, g4, g6 and gg are shown. Observe that for any 
€ € (0,1), the band that is distance € above and below the graph of g on the half-open 
interval [0,1) will not contain the graph of any of the functions g,, no matter how 
large n is. » 


The following theorem is the analog of the Cauchy Completeness Theorem (Cor- 
ollary 8.3.16) for uniform convergence of sequences of functions. 


Theorem 10.2.9 (Cauchy Criterion for Uniform Convergence). Let A C R be a 
non-empty set, and let { fy}; be a sequence of functions A > R. Then { fn}; is 


10.2 Sequences of Functions 495 


uniformly convergent if and only if for each € > 0, there is some N € N such that 
n,m € Nandn,m > N imply |fn(x) — fin(x)| < € forall x € A. 


Proof. First, suppose that {f,};,_, is uniformly convergent. Then there is a function 
f: A—R such that {f,};, converges uniformly to f. Let ¢ > 0. By the definition 
of uniform convergence, there is some N € N such that r € N and r > N imply 


| f(x) — f (x)| < § for all x € A. Suppose that n,m € N and n,m > N. Let x € A. Then 
| fn (x) — fn(x)| = |fn(x) — F(a) + FQ) — Fm(*)| 
< [fnl2) - f+ [F() — In) <5 +5 =e. 


Second, suppose that for each € > 0, there is some M € N such that n,m € N 
and n,m > M imply | f(x) — fin(x)| < € for all x € A. Let g: A — R be defined as 
follows. Let y € A. The hypothesis on { f,}"°_, implies that the sequence { f,(y)} 7, 
is a Cauchy sequence. The Cauchy Completeness Theorem (Corollary 8.3.16) implies 
that { f,(y) };_, is convergent. Let g(y) = jim fn (y). 

Let € > 0. There is some M € N such that n,m € N and n,m > M imply | f,(x) — 
fin(x)| < 5 for all x € A. Suppose that k € N and k > M. Let z € A. We know that 
g(z) = tim fn(z), and hence there is some P € N such that m € N and m > P imply 


| fm(z) — g(z)| < 5. Let Q = max{M, P}. Then 


| fe(z) — 8(2)| = |Felz) — folz) + folz) — 8(2)| S Ife) — folz)| + lfalz) — a2) 
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We conclude that {f,}>"_, converges uniformly to g. 


We now show that in contrast to pointwise convergence, uniform convergence 
works very nicely with respect to continuity and integrability, and somewhat nicely 
with respect to differentiability. It is because of this nice behavior of uniform con- 
vergence that such convergence is preferred to pointwise convergence. We start with 
continuity. 


Theorem 10.2.10. Let A C R be a non-empty set, let {f,};_, be a sequence of 
functions A — R and let f: A — R be a function. Suppose that { fy}; converges 
uniformly to f. 


1. Letc €A. If fy is continuous at c for alln € N, then f is continuous at c. 
2. If fn is continuous for all n € N, then f is continuous. 


Proof. 


(1) Suppose that f, is continuous at c for all n € N. Let € > 0. By the definition 
of uniform convergence there is some N € N such that n € N and n > N imply 
| fn(x) — f (x)| < § for all x € A. Because fy is continuous at c, there is some 6 > 0 
such that x € A and |x —c| < 6 imply | fy(x) — fy(c)| < §. Suppose that x € A and 
|x—c| < 6. Then 


F(x) — F(c)] = LF) — fi) + fv) — fre) + iv (c) — F(C)| 
< IF (x) — fv (@)| + [fv (%) — five) + lfiv(e) — FC) 


Se+e+e=e 
a a 


(2) This part of the theorem follows immediately from Part (1) of this theorem. 


Although uniform convergence is used in Theorem 10.2.10 to prove that f is 
continuous, it is more than is minimally needed to prove continuity at a single number 
c in the domain of f; a condition equivalent to the continuity of f at a single number 
is given in Exercise 10.2.11. 

We now turn to integrability, which works as nicely with respect to uniform 
convergence as does continuity. 


Theorem 10.2.11. Let [a,b] C R be a non-degenerate closed bounded interval, let 
{fntn—1 be a sequence of functions |a,b| > R and let f: [a,b] > R be a function. 
Suppose that { f,},,_, converges uniformly to f. If f, is integrable for alln € N, then 
f is integrable and 


b b 
tim, falxdx= f f(x) dx. 


Proof. Suppose that f,, is integrable for all n € N. We first show that { ¢ fn(x) dx} : 


is a Cauchy sequence. Let € > 0. Because { f, },,, converges uniformly to f, it follows 
from the Cauchy Criterion for Uniform Convergence (Theorem 10.2.9) that there is 
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some N € N such that n,m € N and n,m > N imply | fn(x) — fin(x)| < 5 ab 3(b—a) for all 


x € [a,b]. Suppose that n,m € N and n,m > N. Then by Theorem 5.3. I (2), Theo- 
rem 5.5.5 and Theorem 5.3.2 (3) we see that 


syax— J fps) dx| = | [iale) — nla 
< E ; = E 


b 


b 
< | La() — fn(x) [dx 
<eE 


< 56-4) P-9=3 


It follows that { Falx) dx} is a Cauchy sequence. 
n=1 


The Cauchy Completeness Theorem (Corollary 8.3.16) implies that the sequence 
{ ie In(x)azx} : is convergent. Let L = lim [ iy fn(x) dx. We now show that f is 
n= n—-oo 


integrable and f f (x) dx = L, which will complete the proof. 
Let 1 > 0. Because {f,};, converges uniformly ao f, there is some M EN 
such that n € N and n > M imply |f,(x) — f(x)| < abe 3@-ay for all x € [a,b]. Be- 


cause lim fig es) dx = L, there is some K € N such that n € N and n > K imply 
n—-eoo 


Le? faz) dx —L| < 4. Let J = max{M,K}. Because f; is integrable, there is some 
6 > 0 such that if P is a partition of [a,b] with ||P|| < 6, and if T is a representative 
set of P, then |S(f7,P,T) — f? fr(x) dx| < 4. 

Let R be a partition of [a,b] with ||R|| < 6, and let V be a representative set of P. 
By Exercise 5.2.2, we see that 


ye! 


IS(f7,R,V) —S(f,R,V)| < — 2. 


|S 3(b—a) eG 


Then 
b b 
=|SUARY) SHAY) +SURY)— [flyers [fle)as—t 


</SRV)-SUnRVI+]SH.RY)— [tdasl +] f° loyas—z] 


” 1 1 
el le 


Therefore f is integrable and ic f(x)dx=L 


The displayed equation in Theorem 10.2.11 can be written as 


b b 
lim falxdx= [ lim fn (x) dx 
a ats 


n—-oo 


which is often stated informally by saying that “the limit passes through the integral 
sign.” We need to proceed with caution in such situations, however, because we saw 
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in Example 10.2.4 (4) that the limit does not pass through the integral sign in general, 
and does so only under suitable hypotheses (namely, uniform convergence). 

As seen in Example 10.2.8 (1), differentiability does not work as nicely with 
respect to uniform convergence as do continuity and integrability. For that reason, 
the following theorem has stronger hypotheses than Theorem 10.2.10 and Theo- 
rem 10.2.11. 


Theorem 10.2.12. Let 1 C R be a non-degenerate open bounded interval, and let 
{fn};,—1 be a sequence of functions I > R. Suppose that f, is differentiable for all 
n€N, that {f,}"_, is uniformly convergent and that { f,(c) };_, is convergent for 
some c € 1. Then there is a function f: I — R such that f is differentiable, that 
{fn};,—1 converges uniformly to f and that {f/}"_, converges uniformly to f’. 


Proof. We start with a preliminary step. Let n,m € N, and let x,y € J. Suppose that 
x #y. Because f, and fm are differentiable on J, then Theorem 4.3.1 (2) implies 
that fn — fin 1s differentiable on J, and Theorem 4.2.4 then implies that f, — fin is 
continuous on J. We can therefore apply the Mean Value Theorem (Theorem 4.4.4) to 
Sn — fim festricted to the closed bounded interval from x to y (we do not know which 
of x or y is larger, but it does not matter). Hence there is some d strictly between x 
and y such that 


n(¥) — fn(y)] 


ful) — fn(@) = (10.2.1) 

We now show that {f,};"_, is uniformly convergent. Let J = (a,b). Let € > 0. 
Because { f,(c) };-_, is convergent, then by the Cauchy Completeness Theorem (Cor- 
ollary 8.3.16) we know that {f,(c) };°_, is a Cauchy sequence. Hence there is some 
N €N such that n,m € N and n,m > N imply | fn(c) — fin(c)| < §. Because {fi} 71 
is uniformly convergent, then by the Cauchy Criterion for Uniform Convergence 
(Theorem 10.2.9) ere is some M €N such that n,m € N and n,m > M imply 
fi (x) — fir(x)| < mI = 3B-a forall x J. Let P= max{N, M}. 

Suppose that n,m € N and n,m > P. Let z € J. There are two cases. First, suppose 
that z=c. Then fa(2) — fim(Z)| = |fn(c) — fmn(c)| < § < €. Second, suppose that z # c. 
By the preliminary step, using x = z and y = c, there is some q strictly between z and 
c such that Equation 10.2.1 holds, using d = q. Then 


| f(z) — fm(z)| = |[fn(z) — fa (z)] — [fn(€) — fm(©)] + [fn(e) — fm (e)]] 
S |[Fn(z) — fm()] — [fnle) Holl tate )— fm(e)| 
= | fn(9) — fn(@)| lz —e| + \fnle) — fm(e)| 


22 -(b—a)+-=€. 


2(b —a) 


E 
2 
Putting the two cases together, we can apply the Cauchy Criterion for Uniform 
Convergence to { f,};"_;, and we deduce that {f,};"_, is uniformly convergent. Hence 
there is a function f: J > R such that {f,};_; converges uniformly to f. 
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Let p € I. We will show that f is differentiable at p, and that lim f’(p) = f’(p). 


First, however, we need another preliminary step. For eachn € N, let g,: J > R be 
defined by 


( ) ee if x E 8 {p} 
&n = : 
fn(P): ifx =p. 


We have therefore defined a sequence of functions {g,};_,. We will show that {g,,};"_, 
is uniformly convergent. 

Let n > 0. Because {f/}°"_, is uniformly convergent, then by the Cauchy Criterion 
for Uniform Convergence there is some Q € N such that n,m € N and n,m > Q imply 
| fi (x) — ff, (x)| <7 for all x € I. Suppose that n,m € N andn,m > Q. Let w € J. There 
are two cases. First, suppose that w = p. Then |gn(w) — gm(w)| = |gn(p) — gm(p)| = 
\f'(p) — f,,(p)| <1. Second, suppose that w # p. By the first preliminary step, using 
x =w and y = p, there is some r strictly between w and p such that Equation 10.2.1 
holds, using d = r. Then 


fu(w)—fn(P) — fmn(w) — fn(P) | 
w—p w—p 
- | [fn(w) — fm(w)] — [frlP) — fm(P)] 


w—p 


lgn(w) = &m(w)| = 


| =[F() — AD <0. 


Putting the two cases together, we can apply the Cauchy Criterion for Uniform 
Convergence to {g,,},_,, and we deduce that {g,,}>_, is uniformly convergent. Hence 
there is a function g: J > R such that {g,}>_, converges uniformly to g. 

Let k € N. Because f;, is differentiable at p, we know that 


lim Sil) — fk(P) 


x—p x—p 


exists and equals fj(p). It therefore follows from Lemma 3.3.2 that g, is continuous 
at p. We deduce from Theorem 10.2.10 (1) that g is continuous at p. It follows from 
Lemma 3.3.2 that lim g(x) = g(p). 

xp 


Because {f;,}°"_, is uniformly convergent, then by Lemma 10.2.6 we know that 
{f/};,_1 is pointwise convergent, and hence { f/(p)};"_, is convergent. It then follows 
from Theorem 8.2.9 that 


tim £2) ~FUP) _ pinyin 28 =F) sig tim gn(x) = lim g(x) 


xp xX—p Xx pn—-co Xx—p X— pNn—-oee xp 


= g(p) = lim gn(p) = lim f,(p). 


We deduce that f is differentiable at p and f’(p) = lim f/(p). 
n 


It follows that f is differentiable, and that {f/}°, converges pointwise to f’. 
Because { f/}""_, is uniformly convergent by hypothesis, we then use Lemma 10.2.3 
and Lemma 10.2.6 to conclude that {f/};"_, converges uniformly to f’. 
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The conclusion of Theorem 10.2.12 can be summarized by writing 


Jim. ff,(x) = [Lim fu(x)] 
for all x € J, though, as before, we need to proceed with caution when writing such 
expressions, because they hold only under certain hypotheses. 

The reader might be puzzled about the hypothesis that { f,(c)};_, is convergent 
for some c € J in the statement of Theorem 10.2.12; it seems strange to require such a 
hypothesis for only one number in J. This hypothesis cannot be dropped, however, as 
the reader is asked to show in Exercise 10.2.8. 


Reflections 


The material in this section might appear upon first encounter to be somewhat 
dry and technical. However, this section contains some examples of counterintuitive 
behavior, namely, the existence of a sequence of continuous functions that converges 
(pointwise) to a discontinuous function (and similarly for integrable or differentiable 
functions), and seeing such counterintuitive behavior is, in addition to our use of 
rigorous proofs, one of the features that distinguishes real analysis from introductory 
calculus. Moreover, the distinction between pointwise convergence and uniform 
convergence of sequences of functions, and the better behavior of the latter, is crucial 
for the remaining sections of this chapter, which include discussion of power series 
and a continuous but nowhere differentiable function. 


Exercises 


Exercise 10.2.1. [Used in Example 10.2.4.] 


(1) Let a , and / be the functions given in Example 10.2.4 (3). Prove that 
{in}, Converges pointwise to h. 

(2) Let {pn };-_, and p be the functions given in Example 10.2.4 (4). Prove that 
{Pn},,_> converges pointwise to p. 


Exercise 10.2.2. 


(1) Let {h, };°_, and h be the functions given in Example 10.2.4 (3). Using only 
the definition of uniform convergence, prove that {h,}/"_, is not uniformly 
convergent. 

(2) Let {p,},,_, and p be the functions given in Example 10.2.4 (4). Using only 
the definition of uniform convergence, prove that {p,};"_, is not uniformly 
convergent. 


Exercise 10.2.3. For eachn €N, let f,: [0,1] — R be defined by f,(x) = a for all 
x € [0,1]. Is {f,};7_, pointwise convergent, uniformly convergent or neither? Prove 
your answer. 


co 


Exercise 10.2.4. Let A C R be a non-empty set, let {f, }°_, and {g,};°_, be sequences 
of functions A — R, let f,g: A — R be functions and let k € R. Suppose that {fn }7 
and {g,}”_, converge pointwise to f and g, respectively. 
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(1) Prove that {f, + 8,};) converges pointwise to f+ g. 
(2) Prove that {kf, };_, converges pointwise to kf. 
(3) Prove that {fngn};_; converges pointwise to fg. 


Exercise 10.2.5. Let A C R be a non-empty set, let { f, };°_, be a sequence of functions 
A — Rand let f: A— R be a function. Suppose that {f, };°_, converges pointwise to 


f. 


(1) Suppose that /,, is increasing for each n € N. Is f necessarily increasing? Give 
a proof or a counterexample. 

(2) Suppose that f, is bounded for each n € N. Is f necessarily bounded? Give a 
proof or a counterexample. 


Exercise 10.2.6. [Used in Exercise 10.3.7.] Let A C R be a non-empty set, let {fi} 7 
and {g,},,_ be sequences of functions A — R, let f, g: A — R be functions and let k € 
R. Suppose that {f, }°_, and {g,};"_, converge uniformly to f and g, respectively. 


(1) Prove that {f+ 9n};_, converges uniformly to f+ g. 

(2) Prove that {kf,};°_, converges uniformly to kf. 

(3) Prove that if f, is bounded for each n € N, then f is bounded. 

(4) Prove that if f is bounded, then there are N € N and M € R such that | f,(x)| < 
M for all x € A and all n € N such thatn > N. 

(5) Prove that if f and g are bounded, then { f;,g,},_; converges uniformly to fg. 

(6) Find an example of {f,};, and {g,};_, such that { f,g,}7°, does not con- 
verge uniformly to fg. 


Exercise 10.2.7. Let A C R be a non-empty set, let { f,};°_, be a sequence of functions 
A—R, let f: A — R be a function and let g: R — R be a function. Prove that if 
{fn};,—1 converges uniformly to f, and if g is uniformly continuous, then {go f,}7 | 
converges uniformly to go f. 


Exercise 10.2.8. [Used in Section 10.2.] Find an example of a sequence {f,};, of 
functions (0,1) — R such that f, is differentiable for each n € N, that {f/}""_, is 
uniformly convergent and that {f,};"_, is not pointwise convergent. 


Exercise 10.2.9. Let C C R be a non-degenerate closed bounded interval, and let 
g,h: C > R be functions. Suppose that g and / are continuous. The distance from 
g toh, denoted ||g—/|, is defined by ||g —A|| = lub{|g(x) — h(x)| | x € C}; this least 
upper bound exists by the Extreme Value Theorem (Theorem 3.5.1). 

Let {fn};_, be a sequence of functions C — R, and let f: C — R be a function. 
Suppose that f, is continuous for all n € N, and that f is continuous. Prove that 
{fn};,-1 converges uniformly to f if and only if tim fn — f || = 0. 


Exercise 10.2.10. [Used in Example 10.2.4.] A sequence of functions is a collection 
of functions indexed by the natural numbers. It is also possible to have a collection of 
functions that is doubly indexed by the natural numbers, which is equivalent to being 
indexed by N x N. Such a collection is denoted { Fam}p. m—|- Find an example of such 
a doubly indexed collection of functions R — R such that for each n € N the sequence 
{fnm},,—1 18 pointwise convergent and for each m € N the sequence {f;m},_, is 
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pointwise convergent, and that for some x € R the sequence { lim fim (x)} is 
mo =1 


co 


convergent and the sequence { lim frm (x)} 
lim lim fym(x). 


Mm—- © Nl—- 00 


Exercise 10.2.11. [Used in Section 10.2.] Let A C R be a non-empty set, let c € A, let 
{fn};,—1 be a sequence of functions A — R and let f: A > R be a function. Suppose 
that {f,},,; converges pointwise to f, and that f, is continuous at c for all n € N. 
Prove that f is continuous at c if and only if for each € > 0 and each M EN, there is 
some 6 > 0 and some p € N such that p > M and that x € A and |x —c| < 6 imply 
fo(x) — F(a) <€. 
Exercise 10.2.12. The purpose of this exercise is to show that if the hypotheses of 
Theorem 10.2.12 are slightly strengthened, then a simpler proof of the conclusion of 
that theorem can be given. This simpler proof does not resemble the proof of Theo- 
rem 10.2.12, but rather uses the Fundamental Theorem of Calculus, both versions. 

Let J C R be a non-degenerate open bounded interval, and let {f,}>, be a 
sequence of functions J — R. Suppose that f,, is continuously differentiable for all n € 
N, that {f/}"_, is uniformly convergent and that { f,,(c)};_, is convergent for some 
c €I. Prove that there is a function f: J > R such that f is differentiable, and {f,} > 
converges uniformly to f, and {f/}""_, converges uniformly to f’. (Observe that Theo- 
rem 10.2.12 has the weaker hypothesis that /,, is required to be only differentiable, 
rather than continuously differentiable, for all n € N.) 

To prove the result, let J = (a,b), let L = tim fn(c) and let h: I > R be the 


is convergent, but lim lim fim(x) 4 
m=1 n—0co m—00 


function such that {f/}"_, converges uniformly to h. Because f, is continuous for 
all n € N, then Theorem 10.2.10 (2) implies that 4 is continuous, and hence h is 
integrable by Theorem 5.4.11. Let f: I > R be defined by f(x) = L+ J* h(t) dt for 
all x € I. Prove that f has the desired properties. 


10.3 Series of Functions 


Having looked at sequences of functions in Section 10.2, we now turn to series of 
functions. Analogously to series of numbers, a series of functions is a formal infinite 
sum of functions, which might or might not actually add up to a function. 


Definition 10.3.1. Let A C R be a non-empty set. A series of functions A — Risa 
formal sum 


Vf=Atht+ht+--, 
n=1 


where {f,,};,_) is a sequence of functions A — R. Each function f,, where n €N, is 
called a term of the series Yr, fh. 


Analogously to series of numbers, the convergence of series of functions is defined 
in terms of the sequence of partial sums. Hence, just as we have both pointwise 
convergence and uniform convergence of sequences of functions, we will also have 
pointwise convergence and uniform convergence of series of functions. 
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Definition 10.3.2. Let A C R be a non-empty set, and let )"_, f, be a series of 
functions A — R. 


. For eachk €N, the artial sum of )°*_, f,, denoted s;, is define Sk= 

1. For each k EN, the k” partial fL°_, fr denoted sj, is defined by 
yi, fi. The sequence of partial sums of Y°_, fn is the sequence {s;,}°"_,. 

2. The series of functions )7_, f, is pointwise convergent if the sequence of 
partial sums {s,},"_, is pointwise convergent. If {s,,}°_, converges pointwise 
to a function f: A — R, we say that )'"_, f, converges pointwise to /. 

. The series of functions ¥.”_, f,, is uniformly convergent if the sequence o 

3. Th f funct ean iformly gent if the seq f 
partial sums {s, };"_, is uniformly convergent. If {s,};"_, uniformly converges 
to a function f: A — R, we say that )_, fn converges uniformly to f. A 


As with series of numbers, we note that a series of functions does not have to 
start with n = 1, and that the convergence of a series of functions is unaffected by 
changing, or dropping, finitely many terms of the series. 

As with the convergence of sequences of functions, it is similarly the case that 
uniform convergence of series of functions behaves nicer than pointwise convergence 
of series of functions. 

The following two lemmas are derived immediately from Lemma 10.2.6 and 
Lemma 10.2.3, respectively, and we omit the proofs. 


Theorem 10.3.3. Let A C R be a non-empty set, let V7) fn be a series of functions 
A — Rand let f: A— R be a function. If V7, fn converges uniformly to f, then 
yr Sn converges pointwise to f. 


n=1 
Lemma 10.3.4. Let A C R be anon-empty set, and let V_, fy be a series of functions 


AR. Ifv_, fn converges pointwise or converges uniformly to f for some function 
f: AR, then f is unique. 


We will see in Example 10.3.7 that pointwise convergence of a series of functions 
does not necessarily imply uniform convergence. However, just as it is not always 
easy to prove that a series of numbers is convergent directly by the definition, which 
is why we need the various convergence tests given in Sections 9.3 and 9.4, it is also 
not always easy to prove uniform convergence of a series of functions directly by 
the definition. Hence, before giving an example of uniform convergence of a series 
of functions, we prove the following two convergence tests. The first of our tests 
is analogous to the Divergence Test for series of numbers (Theorem 9.2.5). We use 
the term “zero function” to refer to any function that is constantly zero (so that the 
codomain of the function must be a subset of R). 


Theorem 10.3.5 (Divergence Test for Series of Functions). Let A C R be a non- 
empty set, and let Y_, fy be a series of functions A — R. 


1. If {fn};_1 does not converge pointwise to the zero function, then Y_, fn is 
not pointwise convergent. 

2. If {fn};_1 does not converge uniformly to the zero function, then VY, fn is 
not uniformly convergent. 


Proof. Left to the reader in Exercise 10.3.3. 
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The following convergence test for series of functions is somewhat analogous 
to the Comparison Test for series of numbers (Theorem 9.3.2), though instead of 
comparing one series of functions with another series of functions, we will compare a 
series of functions with a series of numbers. 


Theorem 10.3.6 (Weierstrass M-Test). Let A C R be a non-empty set, and let 
Yr Sn be a series of functions A — R. Suppose that for each k € N, there is some 
My € R such that | fx(x)| < My for all x € A. If Y°_, My is convergent, then, fn 
is uniformly convergent. 


Proof. Suppose that )°_, M, is convergent. Observe that M; > 0 for all k € N. Let 
{5n},,1 be the sequence of partial sums of 1°, fn, and let {t,}*_, be the sequence 
of partial sums of Pr, Mn. 

Let ¢ > 0. By hypothesis the sequence {t,};°_; is convergent, and it follows 
from the Cauchy Completeness Theorem (Corollary 8.3.16) that {t,}>_, is a Cauchy 
sequence. Hence there is some N € N such that n,m € N andn,m > N imply |t, —tn| < 
€. 


n=1 


Suppose that n,m € N andn,m > N. Let x € A. Ifn =m, then |s,(x) — sm(x)| = 
0 < €. Now suppose that n 4 m. Without loss of generality, assume that n > m. Using 
Exercise 2.5.3, and the fact that M; > 0 for all k € N, we see that 


m 


n n 
[Sn (x) — Sm (x l= |. )-YA®) =| YA) < : ils YL Mi 
i=1 i=m+1 i=m+1 i=m+1 
n m 
yy My; iM = lh—tal <6. 
i=m+1 


It now follows from the Cauchy Criterion for Uniform Convergence (Theorem 10.2.9) 
that {s,};°_, is uniformly convergent, and hence )"_, f, is uniformly convergent. 


Example 10.3.7. For each n € NU {0}, let f,: (—1,1) — R be defined by f,(x) =x” 
for all x € (—1,1). We want to examine the convergence of the series Y""_» fy, which 
can be written as "9x". 

First, we show that the series ))""_9 fn is pointwise ponvererie Let y € (—1,1). By 
Example 9.2.4 (4) we see that the ses of numbers )°7’_9 y” is a geometric series, that 
it is convergent and that )" 9) y” = =—. It follows YF _9 fn converges pointwise 


to the function f: (—1,1) -R tne by f(x) = a for all x € (—1,1). 

Next, we show that )* 9 f, is not uniformly convergent. Let g: (—1,1) — R be 
defined by g(x) =0 for all x € (—1,1). The same argument used in Example 10.2.8 (2) 
can be used to show that {f,};) does not converge uniformly to g. It now follows 
from the Divergence Test for Series of Functions (Theorem 10.3.5) that "9 fy is not 
uniformly convergent. Hence, pointwise convergence of a series of functions does not 
necessarily imply uniform convergence. 

Finally, let b € (0,1). For eachn EN, let hy, = Sal[—o}- We will show that Yo An 
is uniformly convergent. For eachn € N, let M, = b”. Using Lemma 2.3.9 (5), which 
can be extended by recursion to any finite product of numbers, and Exercise 2.5.12 
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(2), we see that if n € N and x € [—b, b], then 0 < |x| < b, which implies that |h,(x)| = 
|x”| = |x|" <b" = My. 

Also, observe that YM, = V7, b” is convergent, again because it is a geomet- 
ric series, and |b| < 1. It now follows from the Weierstrass M-Test (Theorem 10.3.6) 
that "9 An is uniformly convergent. We could also state this result by saying that 
Yeo fn “is uniformly convergent on [—b, b].” 0) 


The series )\""_, fn in Example 10.3.7 is a power series, as defined in Section 9.5. 
We now see in the following theorem that the behavior of this example is typical 
of all power series. This theorem, which is a nice application of the Weierstrass 
M-Test (Theorem 10.3.6), will be important in our further study of power series in 
Section 10.4. 


Theorem 10.3.8. Let 1 9 cn(x —a)" be a power series in R. Let R be the radius of 
convergence of V9 Cn(x—a)". Suppose that R > 0. If P € (0,R), then Ly cn(x—a)” 
is uniformly convergent on |a— P,a+ P}. 


Proof. Let P € (0,R). Then a+ P € (a—R,a+R), and hence Y* 9c¢n(x—a)" is 
absolutely convergent at x = a+ P. Therefore YP") ¢nP" = Y7-9 Cn((a +P) —a)" is 
absolutely convergent, which means that °° _¢ |c,|P” is convergent. 

Let k © NU {0} and x € [a—P,a+P]. Then |x —a| < P, and it follows from 
Lemma 2.3.9 (5) (extended by recursion to any finite product of numbers) and 
Exercise 2.5.12 (2) that |cy(x — a)*| < |c,|P*. We can now use the Weierstrass 
M-Test (Theorem 10.3.6) with M,, = |c,|P” for all n € NU {0}, and we deduce that 
Yo Cn(x — a)" is uniformly convergent on [a— P,a+P]. (The Weierstrass M-Test 
was Stated for series that start at n = 1 rather than n = 0, but that makes no differ- 
ence.) 


We conclude this section by showing that the expected behavior of uniformly con- 
vergent series of functions with respect to continuity, differentiability and integrability 
holds. 


Theorem 10.3.9. Let A C R be a non-empty set, let Y7_, fy be a series of functions 
A — Rand let f: A— R be a function. Suppose that V7, fn converges uniformly to 


f. 


1. Letc €A. If fy, is continuous at c for alln € N, then f is continuous at c. 
2. If fn is continuous for all n € N, then f is continuous. 


Proof. Let {s,};_; be the sequence of partial sums of )_, fn, so that {s,}7, 
converges uniformly to f. For each k € N, we know that s; is continuous, because it 
is the sum of finitely many continuous functions. (In Theorem 3.3.5 (1) we saw that 
the sum of two continuous functions is continuous, and that result can be extended by 
recursion to any finite sum.) Both parts of this theorem now follow immediately from 
Theorem 10.2.10 applied to {s,}7). 


The next two theorems show that uniformly convergent series of functions, subject 
to the appropriate hypotheses, can be integrated and differentiated term by term. 
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The proofs of these two theorems are completely analogous to the proof of Theo- 
rem 10.3.9, this time relying upon Theorem 10.2.11 and Theorem 10.2.12, and we 
omit the details. The unpleasant hypotheses in Theorem 10.3.11 correspond to the 
analogous hypotheses in Theorem 10.2.12. 


Theorem 10.3.10. Let [a,b] C R be a non-degenerate closed bounded interval, let 
1 fn be a series of functions [a,b] — Rand let f : [a,b] — R be a function. Suppose 
that Vy, fn converges uniformly to f. If fn is integrable for all n € N, then f is 
integrable and 
b os b 
| f(x) dx = y fn(x) dx. 
a n=1/4 

Theorem 10.3.11. Let J C R be a non-degenerate open bounded interval, and let 
Yr-1 fn be a series of functions I > R. Suppose that fn is differentiable for alln € N, 
that Y°_, f., is uniformly convergent and that Y~_, fn(c) is convergent for some c € 1. 
Then there is a function f : I — R such that f is differentiable, and VY”, fn converges 
uniformly to f, and Y~_, f, converges pointwise to f". 


The displayed equation in Theorem 10.3.10 can be written as 
b| © co rb 
/ y fn(x)| dx = y fn(x) dx, 
a |n=1 n=1"4 
and the conclusion of Theorem 10.3.11 can be summarized by writing 
co 7 co 
é fi) =) fa) 
n=1 n=1 


for all x € J, though caution is needed when writing both of these equations, because 
they hold only under suitable hypotheses. 


Reflections 


This section, in contrast to the previous one, contains no surprises. That is not due 
to the fact that series of functions are in some way better behaved than sequences of 
functions, but rather it is due to the fact that the same fundamental issue, namely, the 
difference between pointwise convergence and uniform convergence, is the basis of the 
behavior of both sequences and series of functions. Because the distinction between 
these two types of convergence was first discussed in the context of sequences of 
functions, the surprising behavior associated with this distinction was also discussed 
there, and there is no need to repeat it in the present section. As we will see in the 
next two sections, it is series of functions rather than sequences of functions that 
are of greater immediate use to us. Of course, as with series of numbers, so too the 
convergence of series of functions cannot be treated if the convergence of sequences 
of functions has not been previously discussed. 
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Exercises 


Exercise 10.3.1. For each n €N, let f,,: [1,e°) — R be defined by f,(x) = 3~™ for 
all x € [1,cc). Prove that )"_) fn =L7_, 3 is uniformly convergent. 


Exercise 10.3.2. Let "| a, be a series in R. For eachn EN, let gn: [—1,1] > R 
be defined by g,(x) = a,x" for all x € [—1,1]. Prove that if Y* | a, is absolutely 
convergent, then )_) gn = Ly) a,x" is uniformly convergent. 


Exercise 10.3.3. [Used in Theorem 10.3.5.] Prove Theorem 10.3.5. 


Exercise 10.3.4. Let A C R be a non-empty set, and let )_, f, and Yr; gn be series 
of functions A — R. Suppose that f,(x) > 0 and g,(x) > 0 for all x € A and alln EN, 
and that there is some N € N such that n € N and n> WN imply f,(x) < g(x) for 
all x € A. Prove that if )_, g, is uniformly convergent, then )"_, f, is uniformly 
convergent. (This exercise is the analog for uniform convergence of the Comparison 
Test (Theorem 9.3.2).) 


Exercise 10.3.5. Let [a,b] C R be a non-degenerate closed bounded interval, and let 
¥_1 fn be a series of functions [a,b] > R. Suppose f, is increasing for all n € N, that 
fn(x) > 0 for all x € A and all n € N, that the sequence { f,(x) }>_, is decreasing for 
all x € A. Prove that if Y°_, (—1)""'f,, is pointwise convergent, then Y°_, (—1)""' fn 
is uniformly convergent. [Use Exercise 8.2.13 (2).] 


Exercise 10.3.6. 


(1) Let A C R be a non-empty set, and let | f, be a series of functions A > R. 
Suppose that 7, 7 € N andi ¥ j imply that {x € A | fi(x) FO}N{x EA | f(x) F 
0} = 9, and that for each k € N, there is some M; € R such that | f;,(x)| < Mx 
for all x € A. Prove that if {M,}°, is convergent and tim, M, = 0, then 


Y-1 fn is uniformly convergent. 

(2) Find an example of a series )_, gy of functions [0,1] — R such that 2? _, gn 
is uniformly convergent, and that )"_; g, does not satisfy the hypotheses of 
the M-Test (Theorem 10.3.6). 


Exercise 10.3.7. The reader is undoubtedly familiar with the importance of differ- 
ential equations in many applications of mathematics. In addition to their practical 
use, however, differential equations can also be studied from a theoretical point of 
view. The most fundamental theoretical question about differential equations is to 
find criteria that guarantee that certain of them have solutions, and even better unique 
solutions; such an existence question is quite distinct from the practical question of 
how to find formulas for the solutions of specific differential equations. 

We are interested here in ordinary differential equations, which are differential 
equations with a single variable. The standard formulation of an ordinary differential 
equation with initial condition is y’ = f(x,y) and y(a) = b, where f is some appro- 
priate function of two variables. For example, if f(x,y) = 5y for all (x,y) in some 
appropriate subset of IR, and if a = 0 and b = 3, then the differential equation with 
initial condition becomes y’ = 5y and y(0) = 3, which, as the reader might recognize, 
has solution y = 3e™. 
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Because we do not discuss functions of more than one variable in this text, 
we cannot treat such a general formulation of differential equations; see [BD09, 
Section 2.8] or [Str00, Section 11.1] for a discussion of the general case. However, we 
can handle at present differential equations of the form y’ = f(y). The example y’ = 5y 
fits into this restricted formulation. The initial condition for our type of differential 
equation remains of the form y(a) = b. 

The purpose of this exercise is to prove that differential equations of the above 
type with initial conditions always have solutions if f is sufficiently well behaved, 
and if we are willing to restrict our attention to a sufficiently small interval containing 
a. This result is a special case of a more general existence and uniqueness theorem 
due to Charles Emile Picard (1856-1941); our proof uses the method known as Picard 
iteration, and includes all the ingredients of the proof of the more general existence 
result. Our theorem is as follows: 

Let J C R be an open interval, let a € R, let b € J and let f: J — R bea function. 
Suppose that there is some K € R such that |f(x) — f(y)| < K|x—y| for all x,y € 7. 
Then there is some 6 > 0 and a function g: (a—6,a+6) —/ such that g’(x) = 
f(g(x)) for all x € (a—6,a+6) and g(a) =b. 

As discussed in Exercise 3.4.5, a function that satisfies the hypothesis involving 
the number K in this theorem is said to satisfy a Lipschitz condition. 

The proof of the theorem will be done in steps. We start with a few preliminary 
observations. First, it must be the case that K > 0, and we may assume that K > 0. 
Second, it follows from Exercise 3.4.5 (1) and Lemma 3.4.2 that f is continuous. 
Third, by Lemma 2.3.7 (2) there is some 6 > 0 such that [b—1,b+] CI. Then 
Corollary 3.4.6 implies that there is some M € R such that |f(x)| < M for all x € 
[b—1,b+ 7]. We may assume that M > 0. 

Let 6 = min{ 7, 5;}. Then (b—M5,b+M5) C[b—n,b+n] Cl. 


(1) We define a sequence {g,,}>_, of functions (a — 6,a + 6) > J using Definition 
by Recursion as follows. Let g;(x) = b for all x € (a—6,a+54). Because fo 
g, 1s constant, then it is continuous by Example 3.3.3 (1). Hence the restriction 
of f og; to each non-degenerate closed bounded interval in (a—6,a+6) 
is continuous by Exercise 3.3.2 (2), and it follows from Theorem 5.4.11 
that fo g; is integrable on every non-degenerate closed bounded interval in 
(a—6,a+6). Let 


ga(x) =b+ f f(eilt)) at 
for all x € (a—6,a+6). By Theorem 5.3.1 (4) and Definition 5.5.8 we see 


that go(x) = b+ f(b)(x—a) for all x € (a—6,a+ 4). Then g is continuous 
by Example 3.3.3 (1), and 


82(x) € (b— | f(b)|6,b + |f(b)|5) S (b-M6,b+M5) CI 
for all x € (a—6,a+54). 


Now suppose that we have defined g, for some n € N, and that g,, is 
continuous, and g,,(x) € J for all x € (a—6,a+ 6). Then f og, is continuous 
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by Theorem 3.3.8 (3), and, as above, it follows that fog, is integrable on 
every non-degenerate closed bounded interval in (a — 6,a+6). Let 


nails) =b+ f flenl6))dt 


for all x € (a—6,a+6). Prove that g,+1 is continuous, and that g,+41(x) € J 
for all x € (a—5,a+6). We have then defined {g,}"_). 
(2) Prove that |gn+1(x) —gn(x)| < 4(K6)” forall x € (a—6,a+ 5) andalln EN. 
(3) Prove that the series )"_; (gn41 — gn) is uniformly convergent. 
(4) Prove that the sequence {g,},"_, is uniformly convergent. 
[Use Exercise 10.2.6 (1).] 
(5) By Part (4) of this exercise {8nhn-t converges uniformly to some function 
g: (a—6,a+6) —R. Prove that g is continuous, that g(x) € J for all x € 
(a—6,a+54), that fog is integrable on every non-degenerate closed bounded 
interval in (a—6,a+ 6) and that g(a) =b. 
(6) Prove that 


s(x) =b+ | flelt))at 


for all x € (a—6,a+64). [Use Exercise 10.2.9.] 
(7) Prove that g is differentiable and g’(x) = f(g(x)) for all x € (a—6,a+6). 
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We first encountered power series in Section 9.5, where we viewed power series in the 
context of series in general. We now revisit power series, which we can study in more 
depth than before by making use of the tools we developed in the previous sections 
of this chapter; in particular, we make use of the notion of uniform convergence of 
series of functions. 

Not only do we have more tools at our disposal in the present section in comparison 
with Section 9.5, but, no less important, we adopt two changes in our point of view 
toward power series. First, as we mentioned at the beginning of Section 10.2, rather 
than viewing a power series such as 


Danaea ha ag eae ae 


as a collection of series of numbers, one series for each value of x, we now view 
power series as a single series of functions. Second, whereas in Section 9.5 we started 
with power series, which we then viewed as functions, we now want to start with 
functions, such as e* and sinx, that might not initially be given in terms of power 
series, and we want to see if such functions can be represented as power series. In the 
process of discussing this matter, we will also answer some questions left unfinished in 
Section 9.5, such as whether power series, when viewed as functions, are continuous, 
differentiable and integrable. 

In keeping with our new point of view, we commence our current study of power 
series with the following definition. 
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Definition 10.4.1. Let A C R be a set, let a € A and let f: A  R be a function. The 
function f is represented by a power series centered at a if there is a power series 
YP Cn(x — a)" with non-degenerate interval of convergence J such that J C A and 
f(x) = Deo n(x — a)” for all x € I. A 


Observe in Definition 10.4.1 that the power series °° 9 cn(x— a)" is required to 
have a non-degenerate interval of convergence, which, as noted in Section 9.5, is 
equivalent to the requirement that the power series has positive radius of convergence. 
A power series with zero radius of convergence would represent a function at only one 
point, and as such would not be of any use. Moreover, note that in Definition 10.4.1 it 
is not required that the interval of convergence of the power series equals the domain 
of the function, but only that the interval of convergence is a subset of the domain. 
Although it is nice when the interval of convergence of the power series equals the 
domain of the function, as happens in some cases, we will see in Example 10.4.3 (1) 
that that is not always possible. 

As the reader might expect, if a function is represented by a power series, then that 
power series representation is unique. The following theorem is just a reformulation 
of Theorem 9.5.8 in the terminology of the present section, and we omit the proof. 


Theorem 10.4.2. Let A CR be a set, leta € A and let f: A — R be a function. If f 
is represented by a power series centered at a, then the power series is unique. 


We are now ready for our first examples of functions represented by power series. 
Example 10.4.3. 


(1) Let f: R— {1} > R be defined by f(x) = fa for all x € R— {1}. We saw in 
Example 9.5.2 (1) that the power series )°*_, x” has interval of convergence (—1,1), 
and that 9.x” = wh for all x € (—1,1). Hence f is represented by the power series 
Y;_o9 x", which is centered at 0. 

This power series representation of f might appear to be less than satisfying, 
because the interval of convergence of the power series is only a small part of the 
domain of the function f, and the reader might wonder if there is some other power 
series representation of f centered at 0 that has a larger interval of convergence 
than (—1,1). Unfortunately, there is no such power series representation, because 
Theorem 10.4.2 says that if a function has a power series representation centered at 
a number, that representation is unique, so we can do no better for our function f 
than we have already done. Actually, it should not be too surprising that the function 
f cannot be represented by a power series centered at 0 that has a larger interval 
of convergence. The function f has a vertical asymptote at x = 1, and there is no 
possibility of extending f to a continuous function defined on all of R. As we will see 
in Corollary 10.4.5, if a function is represented by a power series, then the function is 
continuous on the interior of the interval of convergence of the power series. Because 
our function f cannot be extended to a continuous function at x = 1, and because 
intervals of convergence of power series are symmetric about the number a by Theo- 
rem 9.5.4 (except possibly at the endpoints), then we see that there is no hope of 
finding a power series representation for f centered at 0 with a radius of convergence 
larger than 1. 
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(2) In Part (1) of this example we saw that ce = Y% 9x” for all x € (—1,1). We 
can use this power series representation to obtain power series representations of other 
functions. For example, we see that 


1 1 = = 
= _ ( cra _ (ye 
14+3x2 1 —(—3x?) py py 


for all x € R such that —1 < —3x* < 1, which means for all x € (-+, 45). 

We can find many other power series representations by starting with the power 
series representation for and making other substitutions for x. However, doing so is 
not always entirely trivial. Suppose that we want to find the power series representation 
for the function g: R—{—3} — R defined by g(x) = oy for all x € R— {—3}. Which 
substitution should we use with ck to obtain 4? One straightforward approach 
would be 


1 1 = a ae ye 49)8 


for all x € R such that —1 < —(x+2) < 1, which means for all x € (—3,—1). This 
power series representation is centered at —2. Another approach would be 


1 1 1 1 x\r = nae os 
eee 3) a ea 


3 n=0 n=0 


for all x € R such that —1 < —} < 1, which means for all x € (—3,3). This latter 
power series representation, which is centered at 0, has a larger interval of convergence 
than our previous attempt, and it is therefore preferable. The reason the latter power 
series has a larger interval of convergence than the former is that in the latter the 
number about which the power series is centered is farther from x = —3, where the 
function g has a vertical asymptote. © 


The method for finding power series representations in Example 10.4.3 has limited 
use, and will not help us find power series representations of some familiar functions 
such as e* and sinx. We will find the power series representation for these functions 
later in this section, after we have proved some theorems about power series. The 
main technical tool used to prove these theorems is Theorem 10.3.8, which will 
allow us to use the nice properties of uniformly convergent series that we saw in 
Section 10.3. We start by proving that functions represented by power series can be 
differentiated and integrated term by term, except possibly at the endpoints of the 
interval of convergence. Recall from Section 9.5 that if the radius of convergence of a 
power series is R = ce, then we will write (a— R,a+R) to mean (—co, co) = R, which 
allows us to avoid special cases. 


Theorem 10.4.4. Let ACR be a set, let a € A and let f: A— R be a function. 
Suppose that f is represented by a power series Y"_.Cy(x— a)”. Let R be the radius 
of convergence of Y°°_9Cn(x—a)". 
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1. The power series Y*_,nc,(x—a)"~! has radius of convergence R. The func- 
tion f is differentiable on (a—R,a+R), and f'(x) =Y°_,ncy(x—a)""! for 
allx € (a—R,a+R). 


; Ps (x—a)"*! ‘ : 
2. The power series Yin—9 Cn qq — has radius of convergence R. The function 


f is ueptatie on any closed subinterval of (a—R,a+R), and J) f(t)dt = 


pee a ae for allx € (a—R,a+R). 


Proof. We will prove Part (1), leaving the remaining part to the reader in Exer- 
cise 10.4.3. We follow [Pow94]. 


(1) Let P be the radius of convergence of the power series Y°_, ncn(x—a)""!. 
We first show that P = R. 

If P = 0, then P < R. Now suppose that P > 0. Let y € (a—P,a+P). Then 
Ye Cp(y—a)"~! is absolutely convergent, which means that Y°_, |nc,(y— a)" || 
is convergent. By Theorem 9.2.6 (3) we see that Y°_,|y —al- |nca(y — a)""}| 
is convergent, and hence that °°, |ncn(y — a)"| is convergent. If n € N, then 
\cn(y—a)"| < |ncn(y — a)". It follows from the Comparison Test (Theorem 9.3.2) that 
v7 _1 |en(y — a)"| is convergent, which means that °°; cn(y — a)” is absolutely con- 
vergent. Hence Y”_, cn(y—a)” is absolutely convergent. Therefore y € [a—R,a+R). 
It follows that (a— P,a+P) C [a—R,a+R), which implies that P < R. 

We know that R > 0. Let z € (a—R,a+R). By Exercise 2.3.9 there is some 

€ (0,R) such that z € (a—T,a+T). Let w=a+T. Then |z—a| < |w—al and 
w € (a—R,a+R). Hence |z—a| < |w —al, and that Y"_)cn(w—a)" is absolutely 
convergent. Therefore >, |cn(w —a)"| is convergent. It follows from Exercise 9.2.2 
that {|c,(w—a)"|}_, is bounded. Hence there is some M € R such that |c,(w — 
a)"| <M foralln EN. 

Ifn EN, then 


I-a z-a |n—-l1 
Because a “| < 1, it follows from Exercise 9.4.5 that Yj n- weal ae 


is One ene We then use the Comparison Test (Theorem ei 3.2) to see that 
Y2_; |ncn(z—a)""!| is convergent, and hence Y°_, nc,(z—a)"—! is absolutely con- 
vergent. Therefore z € [a—P,a+P}. It follows that (a—R,a+R) C |a—P,a+P, 
which implies that R < P. We conclude that P= R. 
Let u € (a—R,a+R). By Exercise 2.3.9 there is some Q € (0,R) such that 
u € (a—Q,a+Q). By Theorem 10.3.8 we know that 1)" cn(x—a)” is uniformly con- 
vergent on [a— Q,a+ Q]. Because Y*_, nc, (x —a)"~! has radius of convergence R, 
it also follows that Y°_, nc,(x—a)"~! is uniformly convergent on [a — Q,a + Q]. We 
can now apply Theorem 10.3.11 to the restriction of Yo cn(x—a)” to (a—Q,a+Q), 
where the value of c in the statement of the theorem can be anything in (a— Q,a+(Q). 
We deduce that there is some function g: (a—Q,a+Q) — R such that g is 
differentiable, and the restriction of Y*_9c,(x—a)" to (a—Q,a+Q) converges 
uniformly to g, and the restriction of Y*_[en(x —a)"|! = Y_,nen(x—a)""! to 
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(a—Q,a+Q) converges pointwise to g’. However, we know by hypothesis that 
Yo Cn(x — a)” converges uniformly to f, and it follows immediately that the re- 
striction of Yr Cn(x — a)” to (a— Q,a+ Q) converges uniformly to f|(a_¢,a+0)- By 
Lemma 10.3.4 we deduce that f|(a_ga+0) = g- Hence Vy nen(x — at SF) 
for all x € (a—Q,a+Q). In particular, f’(u) = Y°_, nc,(u—a)""!. It follows that 
f' (x) = nen(x—a)"! for all x € (a—R,a+R). 


The following corollary is an immediate consequence of Theorem 10.4.4 (1) and 
Theorem 4.2.4, and we omit the proof. 


Corollary 10.4.5. Let A C R be a set, leta € A and let f: A — R be a function. 
Suppose that f is represented by a power series Y7_9 Cn(x— a)". Let R be the radius 
of convergence of "9 Cn(x— a)". Then f is continuous on (a—R,a+R). 


Theorem 10.4.4 does not make any claims about the endpoints of the interval of 
convergence of the power series, and that is because, as we see in the first part of the 
following example, the convergence or divergence of the original power series at the 
endpoints does not necessarily imply the convergence or divergence at the endpoints 
of the derivative or integral. 


Example 10.4.6. 


(1) We saw in Example 9.5.2 (1) that the power series °° x" has interval of 
convergence (—1, 1), and hence radius of convergence 1, and that Y> 9x” = i: for 
all x € (—1,1). It then follows from Theorem 10.4.4 (1) that £°_, nx"—! has radius of 
convergence 1, and that 


¥ ny! = 1 - i 
_ 1—x (1—x)? 


for all x € (—1, 1). It is left to the reader to verify that the interval of convergence of 
the power series 2", nx’~! is (—1,1), and so in this case differentiation does not 
change the interval of convergence. 

It follows from Theorem 10.4.4 (2) that Yo _ has radius of convergence 1, 
and that a ead . 
¥. i = | a dt 

n+l o l-t 


n=0 


for all x € (—1,1). Using Integration by Substitution for Definite Integrals (Theo- 
rem 5.7.4) with the substitution uv = 1 —t, and Definition 7.2.1, we see that 


rf — [ota = —In(1—x) 
o l-t 1 u oe * 


for all x € (—1,1). Hence 


In(1—x) = ) —— (10.4.1) 
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for all x € (—1, 1). Let us examine the convergence of this ne oe at the endpoints 


of the interval (—1,1). If x = 1, then the power series )? 9 = =e ~ is L_) =, which 


is —1 times the harmonic series, and which is divergent by Example 9.2.4 (5) and 


yntl . os —j)jr-l : : 
Exercise 9.2.3. If x = —1, then the power series ).9 = is Prey ( 1) , which is 


the alternating harmonic series, and which is convergent by Example 9.3.9. Hence the 
interval of convergence of )°" 4 ue is is [-1,1), which is not the same as the interval 
of convergence of )’_..x”, even though both power series have the same radius of 
convergence. 

We will see in Exercise 10.4.10 (5) that the power series formula for In(1 — x) in 
Equation 10.4.1 also holds for x = —1. 

(2) The statement of Theorem 10.4.4 seems very straightforward, though in fact 
it should not be taken for granted. By the Comparison Test (Theorem 9.3.2) it can 
be verified that the series Y°_, #2“ is absolutely convergent for all x € R. However, 
if we use term-by-term diffcrentaon on this series, we obtain the series )°_; S™. 
This new series is divergent when x = 27k, for all k € Z, because for such values 
of x the series is )_, i. which is the harmonic series, and which was proved to be 
divergent in Example 9.2.4 (5). Hence, even though the radius of convergence of a 
power series is preserved under term-by-term differentiation and integration, that is 


not necessarily true for all series with “variables.” © 


The following corollary, which will be very important to us in finding the power 
series representation of functions, is an immediate consequence of Theorem 10.4.4 (1), 
and we omit the proof. 


Corollary 10.4.7. Let AC R be a set, leta € A and let f: A — R be a function. 
Suppose that f is represented by a power series Y"_¢Cy(x—a)". Let R be the radius 
of convergence of Y.° 9 Cn(x— a)". Then f is infinitely differentiable on (a—R,a+R). 


We saw in Example 4.2.5 (2) a function g: R — R that is differentiable, but such 
that g’ is not differentiable. It follows from Corollary 10.4.7 that the function g cannot 
be represented by a power series. 

We saw in Example 10.4.3 and Example 10.4.6 (1) a few instances of functions 
that are represented by power series, though these examples were essentially found 
by luck, because the power series )° 9x” happens to be a geometric series. It would 
be nice to have a more systematic way of finding power series representations for 
those functions that can be so represented, and we will see such a method shortly. 
We start with the following theorem, the statement of which makes implicit use of 
Corollary 10.4.7. Recall the definition of factorials given in Example 2.5.12. 


Theorem 10.4.8. Let A CR be a set, let a € A and let f: A— R be a function. 
Suppose that f is represented by a power series Y\"_9C,(x —a)". Then 


for alln € NU{O}. 
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Proof. Let R be the radius of convergence of °°" c,(x — a)". We know that R > 0, 
and that f(x) = 9 ¢n(x—a)" for all x € (a—R,a+R). By Theorem 10.4.4 (1) we 
know that f’(x) =, ncen(x—a)""! for all x € (a—R,a+R). It can be verified by 
induction that if k € N, then 


co 


f(x) = Yonln=1)(n—2) ++ (nk + Vena)" 
n=k 
for all x € (a—R,a+R); the details are left to the reader. Hence f(a) = k(k— 


1)(k—2)---(kK-—k+ 1)cx = k'cx for each k € N, and the desired formula follows 
immediately. 


We note that Theorem 10.4.8 gives an alternative proof of Theorem 10.4.2. 
The following definition gives us a convenient notation for the power series 
described in Theorem 10.4.8, and for the partial sums of this power series. 


Definition 10.4.9. Let 7 C R be an open interval, let a € J and let f: 1— Rbea 
function. Suppose that f is infinitely differentiable at a. For each k € NU {0}, the k™ 
Taylor polynomial of f centered at a is the polynomial function TJ 4: I — R defined 
by 


TAG -y ot i" 


for all x € J. The Taylor series of f centered at a is the series 


TI4(x )-E re ve 


When a = 0, the k"" Taylor polynomial of f centered at a is called the k** Maclaurin 
polynomial of f, and the Taylor series of f centered at a is called the Maclaurin 
series of f. A 


Observe that the Taylor polynomials of a function are the partial sums of its 
Taylor series. Moreover, these polynomials are quite interesting and useful in their 
own right. Suppose that f: 7 — R is a function, where J C R is an open interval, and 
let a € I. Suppose that f is infinitely differentiable at a. Let n € N. It would be nice 
to find a polynomial, denoted p,, such that p, is the best possible approximation 
of f by a polynomial of degree n. Of course, there could be a number of possible 
choices of criteria for what we might mean by “best possible approximation,” but a 
very commonly used criterion is that we would want p, to agree with f at a, and we 
would want the k'" derivative of p, to agree with the k" derivative of f at a for all 
ke {1,2,...,n}. It turns out that if p, satisfies this criterion, then it must in fact equal 
the n Taylor polynomial of f centered at a; the proof is virtually the same as the 
proof of Theorem 10.4.8, except that we start with a polynomial rather than a power 
series, and we omit the details. Hence, the Taylor polynomials are, in many cases, very 
useful for approximating the value of the original function, at least near the point a. In 
general, the higher the value of 1, the better the approximation of f. For example, if 
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f: R= Ris defined by f(x) = sin x for all x € R, then the graph of f, and the graphs 
of its Maclaurin polynomials for n € {1,3,5,7}, are seen in Figure 10.4.1. It should 
be noted, however, that whereas Taylor polynomials are useful for approximating 
functions in some situations, for modern computing they are in general not as useful 
as other polynomial approximations of functions, or some non-polynomial algorithms 
(such as the CORDIC algorithm for approximating trigonometric and logarithmic 
functions) that are particularly suited to computer and calculator architecture; see 
[Mul06] for details. 


Fig. 10.4.1. 


The following corollary is simply a restatement of Theorem 10.4.8, using the 
terminology of Definition 10.4.9. 


Corollary 10.4.10. Let A C R be a set, leta € A and let f: A — R be a function. 
Suppose that f is represented by a power series V9 Cn(x—a)". Then V9 Cn(x—a)" 
is the Taylor series of f centered at a. 


It is important to stress that Corollary 10.4.10 does not say that every function 
is represented by its Taylor series, but only that if a function has a power series 
representation, then that representation must be the Taylor series. Indeed, as we will 
see in the fourth part of the following example, while an infinitely differentiable 
function always has a Taylor series centered at any number a in its domain, such a 
function is not always represented by its Taylor series centered at a. 


Example 10.4.11. 


(1) Let f: RR be defined by f(x) = e* for all x € R. We compute the Maclaurin 
series for f. It follows from Theorem 7.2.7 (2) that f is infinitely differentiable, and 
that f(”) (x) = e* for all x € R and all n € NU {0}. By Theorem 7.2.8 (1) we see that 
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f(0) = 1 for all n € NU {0}. Hence the Maclaurin series of f is 
co (n) co 
TI(x) = y f (0) an = s: a 


We saw in Example 9.5.2 (2) that the interval of convergence of this power series is R. 
Is the function f represented by its Maclaurin series? More specifically, does e* 
equal its Maclaurin series for all x € R? The answer will turn out to be yes, though 
we will have to wait until Example 10.4.16 to see a proof of this fact. 
(2) Let g: (0,e°) — R be defined by g(x) = Inx for all x € (0,0¢). The function g 
does not have a Maclaurin series, because g is not defined at 0, but we will compute 
the Taylor series for g centered at a = 1. We know by Exercise 7.2.2 that g is infinitely 


differentiable, and that g)(x) = eee for all n € N and all x € (0,0). It 


follows that g”)(1) = (—1)""!(n—1)! for alln EN. By Theorem 7.2.3 (1) we see 
that g°)(1) = 0. Hence the Taylor series of g centered at 1 is 


co 


2 g(n) —1y!(n—1)! oo 7__4)\n— 
Oe) Mie i 1} Le yay 1) 


We saw in Exercise 9.5.3 that the interval of convergence of this power series is (0, 2]. 

Is the function g represented by its Taylor series centered at a = 1? The answer is 
yes, and in contrast to Part (1) of this example, we can prove most of this fact now, 
because of what we saw in Example 10.4.6 (1). It is left to the reader to verify that if 


x = 1 — wis substituted in Equation 10.4.1, it follows that Inu = y%_, —Y* (w— 1)" 
for all w € (0,2), which means that g(x) equals its Taylor series for all x € (0,2). 
It takes additional effort to show that g(x) equals its Taylor series at x = 2; see 
Exercise 10.4.10 (5) for details. We deduce that the function g is represented by its 
Taylor series. However, it is important to observe that such representation does not 
hold for all x in the domain of g, which is (0,°°), because the Taylor series is not 
convergent on all of the domain of g. There is no way to avoid this problem, because 
by Corollary 10.4.10 the function g cannot be represented by any other power series. 

(3) Let r € R, and let p: (—1,1) — R be defined by p(x) = (14+ x)’ for all 
x€(-1,1). 

We compute the Maclaurin series for p. It follows from Theorem 7.2.13 (1) 
together with the standard rules for differentiation (discussed in Section 4.3) that p is 
infinitely differentiable, and that if x € (—1,1), then 


p(x) = oo ifkeN 
(l+x)’, ifk=0. 


By Theorem 7.2.12 (1) we see that 


: r(r—1)---(r—k+1), ifkeEN 
m0) ={" jel )s i 
; if k=0. 
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For convenience, we use the following standard notation. Let a € R, and let 
k € NU {0}. The binomial coefficient (/’) is defined by 


a\ gee ifk CN 
k}) V1, ifk—0. 


We note that if a € NU {0} and k € {0,...,a}, then this definition of (7) agrees with 
the standard definition of i) found in the Binomial Theorem, counting problems, 
Pascal’s triangle and elsewhere; see [Blo10, Section 7.7] for a very brief discussion of 
these topics. 

Using the above notation, we see that the Maclaurin series of p is 


TP)=y fe y ("x 


! 
n=0 n. n—o \? 


This power series is called the binomial series for r. It is seen in Exercise 10.4.5 (3) 
that if r € NU {0} then the radius of convergence of the binomial series is oo, and 
if r ¢ NU {0} then the radius of convergence of the binomial series is 1; in either 
case the interval of convergence contains (—1, 1). (In those cases where the radius of 
convergence is 1, the convergence or divergence of the binomial series at x = 1 and 
x = —1 depends upon the value of r; see [How66] for details.) 

Is the function p represented by its Maclaurin series? The answer is yes, and we 
can prove this fact now, though not by the straightforward method of Part (2) of this 
example, but rather by a clever trick that is available for this particular power series. 
Let g: (—1,1) — R be defined by q(x) = (1+x)~" Dg (")x” for all x € (—1,1). It 
follows from Theorem 10.4.4 (1) together with the Product Rule (Theorem 4.3.1 (4)) 
that g is differentiable. It is left to the reader in Exercise 10.4.5 (4) to verify that 
q'(x) = 0 for all x € (—1,1). It is straightforward to verify that q(0) = 1, and we 
then use Lemma 4.4.7 (1) to deduce that g(x) = 1 for all x € (—1, 1). It follows that 
p(x) equals its Maclaurin series for all x € (—1,1), and hence p is represented by its 
Maclaurin series. 

(4) The following example is due to Augustin Louis Cauchy (1789-1857). Let 
h: R — R be defined by 


1 
-4+ 
i= ev, ifx40 
0, ifx=0. 


The hard work for this example was done in Exercise 6.3.10 (3), where it was proved 
that h is infinitely differentiable, and that h'”)(0) = 0 for all n € N. It follows that the 
Maclaurin series of h is the series that is constantly zero, which clearly has radius 
of convergence R = co, However, we know that h(x) 4 0 for all x € R— {0}, which 
follows from the fact that the exponential function has codomain (0,°¢). Hence, even 
though the function / is infinitely differentiable, it is not represented by its Maclaurin 
series. © 
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We saw in Example 10.4.11 (2) (3) two functions that were represented by their 
Taylor series. However, the proofs in these two cases were very particular to the 
functions under consideration. We now turn to a more broadly applicable method 
for proving that functions are represented by their Taylor series, stated in Corol- 
lary 10.4.15 below; this method also does not work in all cases, but it is nonetheless 
quite useful, for example for the function in Example 10.4.11 (1), as we will see in 
Example 10.4.16. We start with the following definition and lemma, which reformulate 
this issue in a convenient way. 


Definition 10.4.12. Let J C R be an open interval, let a € J and let f: 1— Rbea 
function. Suppose that f is infinitely differentiable at a. For each n € NU {0}, the n™ 
Taylor polynomial remainder of f centered at a is the function Ri: 1 = R defined 
by RE? = fT". A 


The following lemma is deduced immediately from Definition 10.4.12, the fact 
that the Taylor polynomials are the partial sums of the Taylor series, and Exercise 8.2.5; 
we omit the proof. 


Lemma 10.4.13. Let J C R be an open interval, let f : I — R be a function and let 
a €1. Suppose that f is infinitely differentiable at a. Let x € I. Then f(x) = T(x) if 
and only if lim Ri (x) =0. 


At first glance, Lemma 10.4.13 may not appear to be very helpful, because it looks 
as if we are just renaming things; that is, figuring out whether or not the n Taylor 
polynomial remainder of a function converges to zero does not appear to be any easier 
than figuring out whether or not the Taylor series converges to the function. However, 
as we see in the following theorem, corollary and example, there is a convenient 
expression for the n'* Taylor polynomial remainder of a function, and this expression 
turns out to be easy to work with for some specific functions. This theorem, named 
after Lagrange when stated in its present form, is actually just a restatement of what 
we earlier called Taylor’s Theorem (Theorem 4.4.6), though this time with domain an 
open interval, and hence we state it without proof. 


Theorem 10.4.14 (Lagrange Form of the Remainder Theorem). Let J C R be an 
open interval, let a € I, let f : I + R be a function and letn € NU {0}. Suppose that 
f isn+1 times differentiable. Let x € I. Then there is some p strictly between x and a 
(except that p = a when x = a) such that 


(n+1) 
far) — FTP) yn 
R, (x) = (n+1)! (x a) . 
Corollary 10.4.15. Let I C R be an open interval, let a € I and let f: 1 Rbea 


function. Suppose that f is infinitely differentiable, and that there is some M € R such 
that |f”) (x)| <M" for all x € I and alln € NU {0}. Then 


= pl)(g 
foy=y EO 


n=0 


(x— a)" 


forallx ET. 
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Proof. Let x € I. Letn € NU {0}. By the Lagrange Form of the Remainder Theorem 
(Theorem 10.4.14) there is some p between x and a such that 


_ fOr) (p) 


(x = ao 


Hence 


By the final remark in Example 9.5.2 (2) and Exercise 8.2.4 we know that 


n+l 
See = 0. It then follows from the Squeeze Theorem for Sequences (Theo- 


rem 8.2.12) that { RI“(x)| = 0. By Exer- 
cise 8.2.13 (2) we deduce that lim Ri“ (x) = 0. It now follows from Lemma 10.4.13 
that f(x) = 7/4(x). 


Ri“(x)|} is convergent and lim 
n= teed 


Example 10.4.16. We saw in Example 10.4.11 (1) that the Maclaurin series for e* 
is Pro x, and we saw in Example 9.5.2 (2) that the interval of convergence of this 
power series is IR. We want to show that e* equals its Maclaurin series for all x € R. 
We cannot apply Corollary 10.4.15 directly to e*, because there does not exist an “M” 
as in the statement of that corollary that works for all x € IR, but we can apply the 
corollary to the restriction of e* to any bounded open interval in R. 

Let w € R. Then there is some c € (0,) such that w € (—c,c). By Theo- 
rem 7.2.7 (3) and Theorem 7.2.8 (1) we see that e© > e® = 1. Moreover, by The- 
orem 7.2.7 (3) again and Exercise 2.5.1, it follows that if y € (—c,c) andn EN, then 
|f (»)| = |e’| =e” < e& < [e*]". Corollary 10.4.15 now implies that e’ = Y*_9 y for 
all y € (—c,c). In particular, we deduce that e” = Y""_5 we. Therefore e* = )°""_ x 
for allx ER. 

AS a special case of the above, we see that 


1 1 1 
aig tay 
e=e Era ot a : (10.4.2) 


Additionally, we can obtain the Maclaurin series for other useful functions by 
substituting into the Maclaurin series for e*. For example, we see that 


(=1)'x" 
n! 


Ms 
lI 
Ms 


n=0 


for all x € R. Although we did not obtain this power series representation of e di- 
rectly by Definition 10.4.9, this power series must be the Maclaurin series nonetheless 
because of Corollary 10.4.10. It is very nice that we can compute the Maclaurin series 
of e-* by this indirect method, because attempting to do so by Definition 10.4.9 


—* 


would be quite tedious, because the derivatives of e~* are messy. 0) 
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We conclude this section with the following two applications of power series, the 
first to the number 7, and the second to the number e. 


Example 10.4.17. In Section 7.4 we saw various properties of the number 7, but 
one thing we were not able to do in that section is something very important from a 
computational point of view, which is to provide the first few digits of the decimal 
expansion of 7. We know by Theorem 7.4.5 that 7 is an irrational number, and so 
whereas it is possible to compute finitely many digits in the decimal expansion of 7, 
we cannot write down the entire decimal expansion. Of course, everyone learns at a 
young age that 7 = 3.14159..., but we want to prove that fact rigorously based upon 
the definition of 7 given in Definition 7.3.5. There are a number of ways to approach 
this problem, the simplest being with series. The most familiar way to write 7 as a 


series iS a 
=4(1-=+2—-—=+::: }]. 10.4. 
1 ( 375 a+ ) (10.4.3) 


We will present a different series for 72, both because the above series is proved 
using the arctangent function, which we have not discussed, and because this series 
converges extremely slowly, and therefore many terms are needed to obtain accuracy 
for even the first few digits in the decimal expansion of 7. Instead, we provide a 
different way to write 7 as a series, which converges much faster than the above series, 
and which uses the arcsine function, which was discussed in detail in Section 7.3, 
rather than arctangent. 

We find the Maclaurin series for arcsin as follows. Using the Binomial series 
given in Example 10.4.11 (3), we see that 


1! 2! 3! 
1 1-3 2 Pea 

2-1! 22.2! 23.3! 
for all x € (—1,1). Using Lemma 7.3.7 (3) we see that 


1 = 1 Es 1 xe 1-3 fj. Eg 
Ji=e «Tear Dit * 2223) 73! 


for all x € (—1, 1). It now follows from Lemma 7.3.7, the Fundamental Theorem of 
Calculus Version II (Theorem 5.6.4) and Theorem 10.4.4 (2) that 


Lon 3g, VS g 
Delis | BOs BBB tA7 


arcsin’ x = 


9 
arcsinx = | arcsin’ tdt =x + 
0 
for all x € (—1, 1). 
We know by Exercise 7.3.8 that sin(Z) = 5. and hence A = arcsin(5). Therefore 


7 = Oarcsin d =6 L ! + i + ua + 
7 2) 2 2-1!-3-23 9 22.21-5-25 9 23.31-7-27 ‘ 
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Summing the first 10 terms of this series yields 7 ~ 3.141592623 ..., which is correct 
up to the seventh decimal place. By contrast, summing the first 10 terms of the series 
in Equation 10.4.3 yields 3.041839619..., which is a much worse approximation of 
7. 

The problem of computing the digits of the decimal expansion of z has a long 
history, and continues to be the subject of research; see [AH01] for more information. 


0 


We saw in Theorem 7.4.5 that the number 7 is irrational. We now have the tools 
to prove that the number e is irrational as well. 


Theorem 10.4.18. The number e is irrational. 


Proof. Suppose that e is rational. By Definition 7.2.9 we know that e > 0, and hence 
by Lemma 2.4.12 (2) there are a,b € N such that e = ¢. Let 


1 
bi” 


1 
ores 


1 
113 


1 
= 1 
q=e Usa 


Then b!q is an integer, because each term in g is a fraction that has a denominator 
with factors that are all factors in b!. 
By Equation 10.4.2, we see that 


1 1 1 


G41 G40! oe 


Using Exercise 9.2.6 (2) and Example 9.2.4 (4), and noting that b > 1 and so | al | <i, 
we see that 


1 1 1 
eo (am * (+2)! +3)! +) 


1 1 1 
a 4. sas 
b+1 B+1)b+2) (6+1)(b+2)(b+3) 

1 fl 1 
“G+! G+?’ Ger 
1 
G&D ae 
= 2 ei, 
i? 


It is evident that g > 0, and because b! > 1, then b!g > 0. Hence 0 < b!q < 1, which 
is a contradiction to Theorem 2.4.10 (2). We conclude that e is irrational. 


Similarly to what was stated about the number 7 in Section 7.4, the number e is not 
only an irrational number, but it is in fact a transcendental number, which means that 
it is not the root of any polynomial with rational coefficients; see [Spi67, Chapter 20] 
for a proof. 
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Reflections 


The reader who has taken a standard introductory calculus sequence should be 
familiar with many of the results and examples of power series discussed in this 
section, and hence this section might appear to consist of a lot of effort aimed at 
proving some elementary and familiar results. In fact, the material in this section 
should not be taken for granted. It is hard to overestimate the importance of power 
series in both the history and applications of calculus. The relative brevity of the 
proofs in this section should not fool the reader into thinking that the material in this 
section is simple; the proofs involve some substantial ideas, and they appear brief 
only because the hard work was already done in the previous sections of this chapter. 


Exercises 


Exercise 10.4.1. [Used in Exercise 10.4.9.] Let c € R— {0}. Let f: R—{c} — R be 
defined by f(x) = 4 for all x € R— {0}. 


(1) Find the Maclaurin series for f. 
(2) What is the interval of convergence of this Maclaurin series? 
(3) Prove that f equals its Maclaurin series for all x in the interval of convergence. 


Exercise 10.4.2. Let p: IR — R be a polynomial function. What is the Maclaurin 
series of p? 


Exercise 10.4.3. [Used in Theorem 10.4.4.] Prove Theorem 10.4.4 (2). 


Exercise 10.4.4. Give a direct proof of Corollary 10.4.5 using Theorem 10.3.8, but 
not using Theorem 10.4.4. [Use Exercise 2.3.9.] 


Exercise 10.4.5. [Used in Example 10.4.11.] This exercise uses Example 10.4.1 1 (3). 


(1) Leta ER, and let k € NU {0}. Prove that (,4,) = (2) <¢. 

(2) Leta € R, and let k € NU {0}. Prove that if a ¢ NU {0} then (7) #0, and if 
a € NU {0} and k > a then ({) =0. 

(3) Prove that if r € NU {0} then the radius of convergence of the binomial series 
is co, and if r ¢ NU {0} then the radius of convergence of the binomial series 
is 1. 

(4) Prove that q'(x) = 0 for all x € (—1,1). 


Exercise 10.4.6. [Used in Exercise 10.4.13.] 


(- 1 yryentd 


(1) Prove that the Maclaurin series for sinx is )9 ~Onttyh 


(2) Prove that sinx equals its Maclaurin series for all x € R. 


Exercise 10.4.7. Let f: IR — R be a function. Suppose that f is represented by a 
Maclaurin series )_9 cnx”. Find a condition on the sequence ton ae that is equiva- 
lent to the condition that f(—x) = f(x) for all x € R, and prove the equivalence. A 
function that satisfies this condition is called an even function. 
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Exercise 10.4.8. Let A C R be a set, let a € A and let f: A — R be a function. 
Suppose that f is represented by a power series °° _9 cn(x— a)". Let R be the radius 
of convergence of 9 ¢n(x — a)". Find a condition on the sequence {cy}, that is 
equivalent to f having a local maximum at a, and prove the equivalence. 


Exercise 10.4.9. In this exercise we use the sequence of Fibonacci numbers, denoted 
{F,},,_-;- which was defined in Example 8.4.10. The purpose of this exercise is to use 
power series to prove that 


r= ! ies Jie’ 
wi We 2 2 
for all n € N. (There are other proofs of this formula that do not involve power series, 
for example the proof in [Blo10, Exercise 6.4.12].) This formula is known as Binet’s 
formula, though it is also attributed to the earlier mathematicians Daniel Bernoulli 


and Leonhard Euler. Using the notation @ for the “golden ratio,’ as discussed in 
Example 8.4.10, we can write Binet’s formula as F,, = R (6” —(—1/0)"] for all 


neN. 


(1) Find the two roots 7; and r2 of the polynomial x? +x— 1, and find numbers 
A,B € R such that j 4 z= A_4 _8_ This last expression is the partial 


X-T| X—1rQ* 


fraction decomposition of —. 
(2) Use Part (1) of this exercise together with Exercise 10.4.1 to find the Maclaurin 
series of 


1—x—x2" 
(3) It follows from Exercise 9.5.7 that ae is represented by the power series 
Yeo ng 1x". Use Theorem 10.4.2 together with Part (2) of this exercise to 


derive Binet’s formula. 


Exercise 10.4.10. [Used in Example 10.4.6, Example 10.4.11 and Exercise 10.4.11.] Let 
A CR be aset, leta € A and let f: A — R be a function. Suppose that f is represented 
by a power series )° _9c,(x— a)". Let R be the radius of convergence of Py ¢n(x — 
a)". Suppose that R 4 co. We know by hypothesis that f(x) = 79 ¢n(x— a)” for all 
x € (a—R,a+R), because the interval of convergence of 1° 9 cn(x— a)” contains 
(a—R,a+R). Additionally, we know by Corollary 10.4.5 that f is continuous on 
(a—R,a+R). It might be the case that the interval of convergence of YP" ¢n(x— a)” 
contains the endpoints x = a— R or x =a+R. If it does, is f necessarily continuous 
at these endpoints? The purpose of this exercise is to prove Abel’s Theorem, which 
says that the answer is yes. More specifically, given the above hypotheses, Abel’s 
Theorem states that if 1° 9 cn(x—a)” is convergent atx = a+ R, then ae f(xy= 
x \a - 


Yo Cnk". A similar result holds for the left endpoint x = a — R, and we omit the 
details. 
The proof of the theorem will be done in steps, starting with a special case. 


(1) Suppose that a = 0 and R = 1. Suppose that 9 cnx” is convergent at x = 
a+R=1. Then Y* oc, is convergent. Let L = Yr cn. Let {5,}7 9 be the 
sequence of partial sums of )"_¢ cn. Let y € (—1, 1), and let p € N. Prove that 
Yat’ =(-») yeh Spy" +Spy?. [Use Exercise 5.7.6.] 
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(2) Assume the hypotheses of Part (1) of this exercise. Prove that f(y) = (1— 
Y) Lino Say”. 

(3) Assume the hypotheses of Part (1) of this exercise. By Example 9.2.4 (4) 
we know that Y°  y" = ct and hence L = (1—y))*_pLy”. Let € > 0. 
Then there is some N € N such that n € N and n> N imply |s, —L| < 
£. Let 0= YN 9 |s,—L|+1. Then Q > 0. Let 6 = min{ 39; 1}. Prove that 
if x € (—1,1) and 1—6 <x <1, then |f(x) —L| < e. It will follow that 
ee f(x) =L= LR _oen. [Use Exercise 9.4.2.] 
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(4) a Part (3) of this exercise we proved the theorem in the particular case where 
a =O and R = 1. Deduce the general case from this special case. 
(5) Here is an application of Abel’s Theorem. In Example 10.4.11 (2) we saw that 


co (__1)\n-1 
Inx = y CU 1)" 


n=1 


for all x € (0,2), and we saw that the interval of convergence of this power 
series is (0, 2]. Prove that this power series formula for Inx holds for x = 2. 
Deduce that °°, (— 1yr-lb = In2. (Another proof of this last fact was given 
in Exercise 9.3.10 (2).) 


Exercise 10.4.11. [Used in Section 9.4.] Let 7° 9¢n and 7 _9dn be series in R. 
Let YF en be the Cauchy product of PY? 9cy and YF _odn. Suppose that Yo cn 
and )*_)d, are convergent. Prove that if Y° 9e, is convergent, then [V? 9c,]- 
pb eal dy| = Yn en- [Use Exercise 10.4.10.] 


Exercise 10.4.12. In Example 10.4.16 it was seen that e* = "9 x for all x ER. 
Although it is a straightforward calculation to find the Maclaurin series for e*, a 
rigorous treatment of the subject requires a rigorous definition of the exponential 
function and proofs of the elementary properties of this function, which in turn requires 
a rigorous treatment of the natural logarithm function, as we saw in Section 7.2. 

The purpose of this exercise is to provide an alternative definition of the exponen- 
tial function by using the above power series as the definition of the function, and 
then using this definition to prove some of the standard properties of the exponential 
function. This approach can be used as the formal definitions of the exponential 
function for someone who has not read Section 7.2. Both approaches to the definition 
of the exponential function ultimately involve the same power series (in one case as 
the definition, and in the other case as a consequence of the definition), and hence 
they both yield the same function. The approach used in this exercise might appear 
to be less laborious than the definitions used in Section 7.2, though the brevity is 
somewhat illusory, because the use of power series requires our first having proved 
various results about such series; moreover, the use of power series as a definition of 
a function, while convenient in the present case, is not a definition that promotes an 
intuitive understanding of the function. 

For this exercise, do not use anything from Section 7.2. 

We know by Example 9.5.2 (2) that the al of convergence of )'7" 3 x is R. 
Let E: R— R be defined by E(x) = yy * x - for all x € R. 
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(1) Prove that (0) = 1, and that E’ = E. 

(2) It is evident that E(x) > 0 for all x € [0,¢). Suppose that E(p) < 0 for some 
p € (—,0). Obtain a contradiction by using Exercise 3.5.8 with [a,b] = [p, 0] 
and r = 5 and the Mean Value Theorem (Theorem 4.4.4) on [p, glb S], where 


S is the set defined in Exercise 3.5.8. Deduce that E(x) > 0 for all x € R. 
(3) Let z € R, and let g: R — R be defined by g(x) = ae for all x € R. Prove 
that g(x) = | for all x € R. (Use Parts (1) and (2) of this exercise rather than 


the definition of E directly.) Deduce that E(x+y) = E(x)E(y) forallx,y € R. 


Exercise 10.4.13. This exercise is the analog for sine and cosine of what we saw for 


the exponential function in Exercise 10.4.12. 

_4)ny2n+1 
In Exercise 10.4.6 it was seen that sinx = "4 aco 
argument can be used to show that cosx = Yo oye for all x € R; we omit the 


for all x € R. A similar 


details. Although it is a straightforward calculation to find the Maclaurin series for 
sinx and cosx, a rigorous treatment of the subject requires a rigorous definition of 
the sine and cosine functions and proofs of the elementary properties of these two 
functions, which we saw in Section 7.3, but which were a bit more complicated than 
might be expected. 

The purpose of this exercise is to provide an alternative definition of sine and 
cosine by using the power series as the definitions of the functions, and then using 
these definitions to prove some of the standard properties of sine and cosine. Similarly 
to Exercise 10.4.12, this approach can be used as the formal definitions of sine and 
cosine for someone who has not read Section 7.3. 

For this exercise, do not use anything from Section 7.3, except where otherwise 
noted. 


ny2n 

(1) Prove that the interval of convergence for each of the power series )°7"_9 a 
ny2n+1 ny2n+1 

and Do TU is R. Let $,C: RR be defined by S(x) = Dep GT 


and C(x) =y7_ a for all x € R. 


(2) Prove that $(0) = 0 and C(0) = 1, and that $(—x) = —S(x) and C(—x) = C(x) 
for allx € R. 

(3) Prove that S’ = C and that C’ = —S. Deduce that S” (x) + S(x) = 0 and C” (x) + 
C(x) = 0 for allx ER. 

(4) Prove that $?(x) + C?(x) = 1 for all x € R. (Use Parts (2) and (3) of this 
exercise rather than the definitions of S and C directly.) 

(5) Prove that S(x) > 0 for all x € (0,2). Deduce that C is strictly decreasing on 


(0,2). 
(6) Prove that there is a unique r € (1,2) such that C(r) = 0. Define the number 
ma by m =2r. 


(7) Let y € R. Then let f: R — R be defined by f(x) = S(x+y) —S(x)C(y) — 
C(x)S(y) for all x € R. Prove that f(x) + f(x) = 0 for all x € R, and that 
f(0) =O and f’(0) = 0. Use Exercise 7.3.9 (1) (which does not refer to sine 
and cosine at all) to deduce that f(x) =0 for all x € R. Deduce that S(x+y) = 
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S(x)C(y) +C(x)S(y) for the given value of y, and for all x € R. Because y 
was arbitrarily chosen, we therefore see that S(x+y) = S(x)C(y) + C(x) S(y) 
for all x,y € R. Deduce that C(x+y) = C(x)C(y) — S(x)S(y) for all x,y € R. 
(This part of the exercise is identical to the proof of Exercise 7.3.10.) 

(8) Prove that S(4) = 1. 

(9) Prove that S(x+ 4) = C(x) and C(x + 4) = —S(x) for all x € R. Deduce 
that S(x +) = —S(x) and C(x+ a) = —C(x) for all x € R. Deduce that 
S(x+ 2m) = S(x) and C(x + 2x) = C(x) for allx € R. 


10.5 A Continuous but Nowhere Differentiable Function 


Our final section of the book takes us back to an issue that we first considered much 
earlier, which is the relation between differentiability and continuity. We know from 
Theorem 4.2.4 that if a function is differentiable everywhere, it must be continuous 
everywhere. On the other hand, we know from Example 4.2.3 (3) that a function 
f: R—R can be continuous everywhere but not differentiable everywhere. If a 
function is continuous everywhere, how large can the set of numbers at which the 
function is not differentiable be? The function in Example 4.2.3 (3), which is the 
absolute value function, is differentiable everywhere except at a single number. A 
function such as a “sawtooth function,” as seen in Figure 10.5.1, is everywhere except 
at a discrete countable set of numbers. Are there continuous functions with even 
worse non-differentiability? 


Fig. 10.5.1. 


Intuitively, it might appear that for any continuous function, between any two 
numbers at which the function is not differentiable there must be numbers (not to 
mention whole intervals) at which the function is differentiable. Rather astonishingly, 
as we will see in Theorem 10.5.2 below, it turns out there exist functions that are 
continuous everywhere, but differentiable nowhere. The statement of this theorem 
would have been understandable when we first discussed derivatives in Section 4.2, 
but the proof involves series of functions, and hence we had to wait until now to 
see it. The first example of a continuous but nowhere differentiable function is due 
to Bernard Bolzano (1781-1848) in the 1830s, but, as was the case for his other 
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mathematical work, it did not receive attention from the mathematical community. 
The first publicized example of such a function was due to Karl Weierstrass (1815- 
1897) in the 1870s, and it astonished the mathematical community at the time. We 
present a more recent variant of this example in the proof of Theorem 10.5.2. 

For our construction, we will need the concept of a periodic function, as discussed 
in Section 7.3. For the reader who has skipped that section, it is not necessary to read 
the entire section now, but only from Definition 7.3.1 to Lemma 7.3.4. 

The basic idea of the construction of a continuous but nowhere differentiable 
function is to start with an appropriately chosen periodic function f: R — R, and 
then to define a new function h: R — R by 


nia) = (3) sa 


n=0 


for all x € R. For each n €N, the factor of 4” makes the function f(4"x) have a period 
that is 4” times the period of f, and it is these increasing periods that, intuitively, 
make the function / nowhere differentiable; the factor of (3)” is designed to make 
the series of functions be uniformly convergent, which is what makes the function 
h continuous. Of course, for this construction to work we will need to start with an 
appropriately chosen function /, the existence of which is shown by the following 
lemma. This lemma is more general than we need, because we need only one example 
of the type of function given by the lemma, but if one wants to put in all of the details 
for the existence of one such function, it is no more effort to prove the lemma more 
generally, and doing so helps clarify what the real issues are. 


Lemma 10.5.1. Let K € (0,00). Then there is a function f : R — R that satisfies the 
following properties. 


f is periodic with period 2; 

f is continuous; 

f is bounded; 

ifx,y ER, then | f(x) — f(y)| < Klx—y 
if x,y € R and there is no integer strictly between x and y, then =A Ix —y| < 


f(x) -— FOL 


Proof. Let g: [0,1] — R be a function such that is g continuous on [0,1] and differ- 
entiable on (0,1), and that 2* < |g’(x)| < K for all x € (0,1). There are many such 
functions, for example the function defined by g(x) = Kx for all x € [0, 1]; the reader 
is asked to find another example of such a function g in Exercise 10.5.1. Any choice 
of such a function g will work. 

Let h: [—1,1] — R be defined by h(x) = g(|x|) for all x € [—1, 1]. It follows from 
Exercise 3.3.1 (2) and Theorem 3.3.8 (3) that / is continuous. If x € [—1,1], then 
h(—x) = g(|—-+|) = g(|x|) = h(x); in particular we see that h(—1) = h(1). 

By Lemma 7.3.2 (2) there is a unique periodic function f: R — R with period 
2 such that f| [-1,1] = /. Hence Part (a) holds. By Lemma 7.3.4 (1) the function f is 
continuous and bounded, and hence Parts (b) and (c) hold. 


, 


SAO SA 
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We now prove that Part (d) holds. We start by observing that Exercise 4.4.6 implies 
that the analog of Part (d) holds for g. 

Next, we show that the analog of Part (d) holds for h. Let w,z € [—1,1]. Then 
|h(w) — h(z)| = |g(wl) — g(lz|)| < K||w| — |z||. By Lemma 2.3.9 (7) we see that 
|w| —|z| < |w—z| and |z|—|w| < |z—w| = |w—z|, and hence ||w| — |z|| < |w—z|. 
Therefore |h(w) —h(z)| < K|w—2|. 

We now show that Part (d) holds for f. Let x,y € R. If x = y, then | f(x) — f(y)| = 
0 < K|x—y|. Now suppose that x 4 y. Without loss of generality, assume that x < y. 
We claim that there are £,9 € [—1,1] such that f() = f(x) and f(¥) = f(y) and 
|¢—] < |x—y|. It will then follow that |f(x) — f0)| = [f(@) — £(9)| = [a(®) —h()| < 
K|&—3| < K|x—y|, and the proof of Part (d) will be complete. 

By Exercise 2.6.14 (1) there are unique n,m € Z such that (—1) +2(m—1) <x< 
—1+2m and (—1)+2(n—1)2 <y < (-1) +2n. It follows x —2(m— 1) € [-1,1) 
and y— 2(n—1) € [-1,1). Let § = y—2(n—1), let x’ = x—2(n—1) and let = 
x —2(m—1). Then ¥,§ € [-1,1), and §—x’ = y—x. Because f is periodic with 
period 2, then f(%) = f(x’) = f(x) and (9) = 0). 

There are three cases. First, suppose that |x — y| > 2. Let £ = %. Then f(£) = f(x). 
Because £,f € [—1,1), then |f-—$] <2 < |x—y]. 

Second, suppose that x’ € [—1,1). Let =x’. Then f(£) = f(x), and |f—J$ 
I’ —$] = [kx-yl. 

Third, suppose that |x —y| <2 and x’ ¢ [—1,1). Let t = —x’ —2. Because x < y 
then x’ < $. Because § € [—1,1) and |x’ — $| = |x—y| < 2, then -3 <x < —1. It 
follows that —1 << 1, which means that ¢ € [—1,1]. Using what we saw above 
about h, and the fact that f is periodic with period 2, we see that f(t) = h(£) = 
h(—2%) = h(x’ +2) = f(x’ +2) = f(x’) = f(x). Also, we observe that —2 <x/+1<0 
and 0 <$+1 <2. Then |t—¥$] = |(—x’ —2) -$] =|-—(’ +1) -(8+ I < |-("' + 
1)|+|$+1)=—-@'+1)+ (+1 =$-xX =y-x=|x—y]. 

Finally, we prove that Part (e) holds. Let x,y € R. Suppose that there is no integer 
strictly between x and y. If x = y then Part (e) is trivially true, so suppose that x F y. 
Without loss of generality, assume that x < y. Let x’ and $ be as in the proof of Part (d) 
of this lemma. Then § € [—1,1) and x’ <¥. 

We claim that there is no integer strictly between x’ and $. Suppose to the contrary 
that x’ < p< for some p € Z. Then x = x +2(n—1) < p+2(n—1) <$+2(n—1)= 
y, which is a contradiction to the fact that there is no integer strictly between x and y. 

There are now two cases. First, suppose that § € (—1,1). If it were the case that 
x’ < —1, then we would have x’ < —1 < #, which is a contradiction. Hence —1 < ¥’, 
and therefore x’ € [—1,1). It cannot be the case x’ < 0 < $, and therefore either 
x’,§ € [-1,0] or x’, 9 € [0,1). 

Suppose that x’, € [0,1). Because g is continuous on [0,1] and differentiable 
on (0,1), we can apply the Mean Value Theorem (Theorem 4.4.4) to g|jv 5, and we 
deduce that there is some c € (x’,) such that 


it». 29) —al*) 
OC ae a 


Because ao < |g’(c)|, it follows that 
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2K _ |e(9)—e(v) 
3 7 j—x! : 
Therefore 2K |¥ —.x’| < |g(§) — g(x’)|. We know that $—x’ = y—x, and that f(x) = 


P(e!) = h(i) = g(v) and f(y) = f(9) = h(S) = g(9). It follows that 2 ly — x| < 
| f(y) — f(x) |, which is Part (e). 

Next, suppose that x’,# € [-1,0]. Then —x’, —$ € [0,1]. By the same argument 
as in the previous paragraph, we see that 2 |(—$) — (—x’)| < |g(—$) — g(—x’)|. We 
then observe |(—$) — (—x’)| = |y — x|, and that f(x) = f(x’) = A(x’) = g(—2x’) and 
f(y) =f (9) =A(9) = g(—S). Hence Fly —x| < |f(y) — f(x)|, which again is Part (e). 

Second, suppose that $ = —1. Let y’ = $+2, and let x” = x’+2. Then y’ = 1 and 
x" <y’. The desired result now follows from an argument similar to the previous case, 
but with x” and y’ replacing x’ and $, and we omit the details. 


We now come to the grand finale of this text. 


Theorem 10.5.2. There is a function h: R — R that is continuous but nowhere 
differentiable. 


Proof. We follow [Rud76, p. 154], which is a variant of [McC53], which in turn is 
related to [vdW30]. Let K € (0,00), and let f: R — R be a function the existence of 
which is guaranteed by Lemma 10.5.1. By Part (c) of that lemma there is some B € R 
such that | f(x)| < B for allx ER. 

For each n € NU {0}, let f,: R — R be defined by f,(x) = (3)"F(4"x) for all 
x € R. Then |f,(x)| < B(3)" for all x € R and all n € NU {0}. By Example 9.2.4 (4) 
we know that "9 B (3)” is convergent; the fact that we start at = 0 rather thann = 1 
makes no difference. It now follows from the Weierstrass M-Test (Theorem 10.3.6) 
that °°; fn is uniformly convergent. 

Let h: R — R be defined by h = Y_, fn, which means that 


nis) = ¥ (3) sa 


n=0 


for allx ER. 

By Part (b) of Lemma 10.5.1 the function f is continuous. It then follows from 
Example 3.3.3 (1), Theorem 3.3.5 (3) and Theorem 3.3.8 (3) that f;, is continuous for 
each n € NU {0}. By Theorem 10.3.9 we deduce that / is continuous. 

Let x € R. We will show that there is a sequence {d,,}*°_, in R such that tim, dyn =0, 


and that the sequence 
{ h(x+ dn) — h(x) a 
dy, n=1 


is divergent. It will then follow from Exercise 8.4.3 that h is not differentiable at x, 
and we will conclude that / is nowhere differentiable. 

Let n € N. We define d,, as follows. By Exercise 2.4.2 we know that at most 
one of (4"x — 574"x) and (4"x, 4% + 5) contains an integer. If (4”x,4”x + 5) does 
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not contain an integer, then let d, = If (4"x,4"x + 5 2) contains an integer, then 


air 
(4"x — 54"x x) does not contain an integer, and let d, = x i” . It follows that there is 
no integer strictly between 4"x and 4"(x + d,) =4"x+ 5, where the choice of plus or 
minus depends upon the definition of d,. Using Example 8.2.13 and Theorem 8.2.9 (3) 


we see that lim |d,| = lim oh = 0, and it then follows from Exercise 8.2.13 (2) that 
lim d, = 0. 


n—-eoo 

Let k € N. We examine the value of |f(4*(x+d,)) — f(4*x)|. There are three 
cases. 

First, suppose that k <n. By Part (d) of Lemma 10.5.1 we see that |f(4*(x« + 
dn)) — f (4*x)| < K|4*(x+dn) — 44x| = K|4*dn| = <5. 

Second, suppose that k = n. Because there is no integer strictly between 4”x and 
4"(x+d,), it follows from Part (e) of Lemma 10.5.1 that | f(4"(x+dn)) — f(4"x)| > 
2K 1A" (x + dn) —4"%x| = 2K 5 = &. 

Third, suppose that i >n. Then |4*(x+ dy) —4*x| = |4*d,| = 4, which is an 
integer multiple of 2. By Part (a) of Lemma 10.5.1 the function f 1 is petiodic with 
period 2, and hence |f(4*(x+d,)) — f(4*x)| =0. 

Using the above values for | f(4«(x+d,)) — f(4*x)|, and using Theorem 9.2.6 (2), 
Lemma 2.3.9 (7), Exercise 2.5.3 and Exercise 2.5.12 (3), we see that 


aa _ 4 ¥ (3) seteran-¥ (2) ren 
== HG ): ly Healy) | 
a Gym en iordg) seta (2) [rater dase] 
> (3) eretan) 
164 BE (3) Leto) — 4h 
2e()P¥ Or 


It follows from Exercise 8.2.16 and Exercise 8.2.17 that lim (£3" + 4) = oo, and 
n—-oo 
we then use Exercise 8.2.18 (2) to deduce that 


ae 


is divergent. 


532 10 Sequences and Series of Functions 


It is not possible to draw the graph of the function / defined in the proof of 
Theorem 10.5.2, but it is possible to draw the graphs of the partial sums of h. In 
Figure 10.5.2 we see the first four partial sums of h restricted to [0,2], where the 
function f: R — R used in the definition of / is chosen to be the “sawtooth function” 
seen in Figure 10.5.1; that is, the function f is the periodic function with period 2 that 
equals the absolute value function on [—1, 1]. 


» BY 
x My x 
(i) (ii) 
y y 
x ia x 
(iii) (iv) 
Fig. 10.5.2. 
Reflections 


We have now come to the end of this book, though of course not to the end of the 
story we are telling. This book is only an introduction to real analysis, and while we 
have covered most of those aspects of real analysis treated in a single-variable calculus 
course, the bulk of real analysis awaits the reader in subsequent texts, including both 
the study of more advanced aspects of single-variable real analysis, for example 
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Lebesgue measure and integration, and the study of real analysis in more general 
settings, for example in R” and metric spaces. Moreover, some of the topics covered in 
this text lead naturally to issues that arise in other branches of mathematics, including 
complex analysis, point set topology, probability and differential equations. The reader 
is encouraged to pursue the study of these fields. 

Whatever branch of mathematics one chooses to pursue, the existence of counter- 
intuitive functions such as the one given in Theorem 10.5.2 shows why we need to 
use all the rigor available to us, because things do not always work out as our intuition 
might tell us. Of course, that is what makes mathematics so interesting—if everything 
worked as our intuition told us, then there would be no surprises. As the reader might 
imagine, a further study of mathematics will yield many additional surprises as well. 


Exercises 


Exercise 10.5.1. [Used in Lemma 10.5.1.] Let K € (0,00). Find an example of a 
function r: [0,1] — R such that r is continuous on [0,1] and differentiable on (0,1), 
that ** < |r’(x)| < K for all x € (0,1) and that r does not have a constant derivative. 
Give an explicit formula for r. 


Exercise 10.5.2. The definition of the function / in the proof of Theorem 10.5.2 is 
rather tricky, and the reader might wonder whether there is a simpler way to define 
a continuous but nowhere differentiable function. It seems unlikely that it would 
be possible to define such a function without some sort of limit, but would it at 
least be possible to give a simpler construction? Of course, if a substantially simpler 
construction were known, it would be used instead of the very standard construction 
given above. Nonetheless, it might occur to the reader to try the following idea. 

Let f: R— R be the “sawtooth function” seen in Figure 10.5.1, and let {f, }7 | 
be the sequence of functions R — R defined by f,,(x) = f(nx) for all x € R and for all 
n€N. Intuitively, the function f,, is similar to f, except that it oscillates n times faster; 
in other words, whereas each “tooth” in the graph of f has width 2 and height 1, each 
“tooth” in the graph of f, has width 5 and height 1. The function f;, is continuous for 
each n €N, but as n gets larger, the function f,, has more and more numbers at which 
it is not differentiable. Hence, one might wonder if the sequence {f,,};,, converges 
pointwise to a function f: [0,1] — R that is continuous but nowhere differentiable. 
Prove that that does not happen; that is, prove that either {f,, };_; converges pointwise 
to a function that is not continuous, or that { f, };_, converges pointwise to a function 
that is continuous everywhere and differentiable somewhere, or that {f,}/, is not 
pointwise convergent. 


Exercise 10.5.3. [Used in Example 5.9.16.] The purpose of this exercise is to show 
that there exists a continuous function that is not rectifiable; the latter concept was 
discussed in Section 5.9. Let h: R — R be the function defined in the proof of Theo- 
rem 10.5.2, where the function f: R — R used in the definition of h is chosen to be 
the “sawtooth function” seen in Figure 10.5.1; that is, the function f is the periodic 
function with period 2 that equals the absolute value function on [—1, 1]. Then h|j9 1) 
is continuous by Theorem 10.5.2 and Exercise 3.3.2 (2). 
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(1) Foreachn € NU{0}, let s,: [0,1]  R be defined by s,(x) =D"-9 (3)" f(4"x) 
for all x € R. Letn € NU {0}. The graph of s,, is a polygon. Prove that the 
ratio of the length of the graph of s,,, to the length of the graph of s,, is greater 
than 2. 

(2) Prove that the length of the graph of s, is the polygonal sum of Alio,1) with 
respect to some partition of [0, 1]. 

(3) Prove that Alio,t is not rectifiable. 


10.6 Historical Remarks 


Whereas calculus courses today typically treat series, and in particular power series, 
after first covering the basics of differentiation and integration (though some real 
analysis texts put series, but not power series, first), in fact some discoveries about 
power series predate the invention of calculus, and indeed might have contributed to 
that invention. Moreover, whereas the modern approach to the material in the present 
chapter starts with a discussion of sequences and series of functions in general, and 
only then turns to the study of power series as an application of the properties of 
series of functions, from a historical perspective power series were studied, and used 
as a computational tool, long before the rigorous study of sequences and series of 
functions. 


Renaissance 


Power series for sinx, cosx and arctanx appeared in Tantrasamgraha-vyakhya of 
around 1530, which was a commentary on Tantrasamgraha of around 1501 by 
Nilakantha Somayaji (1444-1544). It was recognized in Tantrasamgraha-vyakhya 
that the power series for arctanx is convergent only when |x| < 1. A derivation of 
these power series is found in Yuktibhasa of 1550 by Jyesthadeva (c. 1500-c. 1575), 
which is based mainly on Tantrasamgraha-vyakhya, and which attributes the series 
for arctanx to Madhava of Sangamagramma (1340-1425). 


Seventeenth Century 


Nicolaus Mercator (1620-1687), not to be confused with the inventor of the Mercator 
projection Gerardus Mercator, had the power series In(1 +x) = x— = + = —-++) in 
Logarithmotechnia of 1668. Isaac Newton (1643-1727) had found this series earlier, 
but Mercator was the first to publish it. James Gregory (1638-1675), in Exercitationes 
Geometricae of 1668, and Edmond Halley (1656-1742), in A most compendius and 
facile method for constructing logarithms, exemplified and demonstrated from the 
nature of numbers, without any regard to the hyperbola, with a speedy method for 
finding the number from the logarithm given of 1697, gave the series In (+) = 


3 5 F : 
2x+ oe + ae +--+, which converges faster than Mercator’s series. In the early 1670s 
Gregory stated the first few terms of various power series, including those for tanx, 
arcsinx and Insec.x, though he did not give proofs. 
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Newton and Leibniz 


Power series were an important part of Newton’s approach to calculus. In fact, his first 
significant mathematical discovery, made in the mid- 1660s before he invented calculus, 
was the binomial series, which built upon the ideas of John Wallis (1616-1703) in 
Arithmetica infinitorum of 1656. Besides the value of the series itself, which Newton 
used for some calculations in his calculus, the binomial series was important in that 
it helped make infinite processes acceptable. Moreover, although Wallis referred to 
fractional and negative exponents, he did not really use them, and Newton was the 
first to do so. 

In De analysi per aequationes numero terminorum infinitas of 1669 (published 
only in 1711), Newton, in part spurred by Mercator’s publication of Logarithmotech- 
nia of 1668, discussed power series, essentially stating without proof that power 
series can be manipulated as we manipulate finite sums, for example that they can 
be integrated term-by-term. Newton used his methods to find the power series for e*, 
sinx and cosx, the latter two by using geometric ideas and term-by-term integration 
to find the power series for y = arcsinx and y = arccosx, and then solving for x in 
terms of y. An important idea for Newton was the analogy between infinite decimal 
fractions, the widespread use of which was fairly recent, and infinite series; Newton 
used ideas from arithmetic (for example long division) and applied them to find the 
power series of some functions. In the 1691-1692 draft of De Quadratura Curvarum, 
though omitted from the final version that was printed as an appendix to The Opticks 
of 1704, Newton gave the first explicit statement of the general formula for Taylor 
series, though without proof; as with other aspects of his work, this formula was first 
published by someone else. 

Although Newton did not have our notion of convergence, and in particular he did 
not have the concept of the interval of convergence for power series, he seemed to be 
aware that convergence was an issue. For example, in using power series to solve a 
differential equation, he noted that the method worked for small values of x, which 
perhaps implicitly recognized that the power series was not convergent for all values 
of x. On the other hand, Newton’s manipulation of series, similarly to Leibniz, was 
mostly formal, without regard to convergence. Moreover, both Newton and Leibniz 
thought geometrically in terms of curves rather than functions (the idea of which 
came later), and so if a power series gave a geometrically meaningful answer, the 
convergence was implicitly assumed. That is, Newton considered convergence only 
when applying the power series, but not in the preliminary manipulations. 

Gottfried von Leibniz (1646-1716) was also concerned with power series. In 1691 
Leibniz gave power series for In(1+.x), arctanx, sinx, cosx — 1 and e* — 1. In 1693 
he used power series to solve differential equations by substituting the power series 
Yo Cn(x— a)” into the differential equation and then finding a recurrence relation for 
the coefficients. The basis for this method, as used by Leibniz, is that a power series 
Ye 9 €n(x — a)” is zero on an open interval if and only if a, = 0 for all n € NU {0} 
(which follows from Theorem 9.5.8). 
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Eighteenth Century 


A number of mathematicians, including Gregory, Newton, Leibniz and others, essen- 
tially knew the formula for Taylor series before Brook Taylor (1685-1731) was the 
first to publish it in 1715. Taylor used an interpolation formula of Gregory and Newton 
to justify the formula. In 1717 James Stirling (1692-1770) gave the proof of Taylor 
series (actually just Maclaurin series) that we use today via successive differentiation. 
Colin Maclaurin (1698-1746) also derived Taylor series by successive differentiation, 
and he applied such series to the study of maxima and minima, in Treatise of Fluxions 
of 1742, which was the first systematic account of Newton’s approach to calculus, 
and which was written to defend that approach from Berkeley’s criticism of the use of 
infinitesimals. 

Power series were an important tool for representing functions in the 18th cen- 
tury. Such series were thought of as infinite polynomials, and it was assumed that 
they behaved essentially the same way as polynomials, which allowed, for example, 
term-by-term differentiation and integration of power series. Convergence was not 
the primary focus when dealing with power series, and mathematicians felt free to 
manipulate power series even outside the interval of convergence. 

Joseph-Louis Lagrange (1736-1813), in his attempt to sidestep Berkeley’s criti- 
cism of infinitesimals, took an approach to calculus based upon the representation of 
functions by power series; it was assumed that every function (as yet a loosely defined 
concept) could be so represented, and it was only later that Cauchy showed that not 
every function could be represented by a power series on an open interval. Lagrange 
proved that if a function is represented as a power Series, then the power series is the 
Taylor series of the function, and he gave what we now call the Lagrange form of the 
remainder for Taylor polynomials. 

Leonhard Euler (1707-1783), in the influential textbook Introductio in analysin 
infinitorum of 1748, introduced the modern definition of logarithms as the inverse of 
the exponential functions, and the modern definition of sine and cosine in terms of 
the unit circle, and he then computed the power series of e*, In(1 +x), sinx and cosx, 
though not via the formula for Taylor series, but by clever use of the binomial series. 


Nineteenth Century 


Bernard Bolzano (1781-1848) gave the first example of a continuous but nowhere 
differentiable function in the 1830s; as with other aspects of Bolzano’s work, this 
example did not attract the attention of contemporary mathematicians. 

Although Taylor series were known and used in the period when calculus was 
first invented, it was Augustin Louis Cauchy (1789-1857), in Résumé des lecons a 
l’Ecole Royal Polytechnique of 1823, who first proved that some examples of Taylor 
series actually converged to the original functions. Cauchy gave a criterion for such 
convergence, using an integral form of the remainder, and he applied his method to 
prove that the Taylor series for sinx and cosx converge to these functions for all x. 
Cauchy gave an example of a non-zero function A with a zero Taylor series (which is 
in Example 10.4.11 (4)); he then pointed out that two different functions can have the 


10.6 Historical Remarks—Sequences and Series of Functions a37 


same Taylor series by considering any function f with a non-zero Taylor series, and 
then comparing f and f +h. Hence, in contrast to Lagrange, Cauchy stated that it is 
not possible to use functions and Taylor series interchangeably. Cauchy attempted to 
prove that series of functions can be integrated term-by-term, something that had been 
previously assumed because of the general assumption that what works for finite sums 
works for series. However, in this proof, as in some other proofs, Cauchy implicitly 
used uniform convergence, which his definition of convergence did not explicitly 
state. 

Cauchy incorrectly asserted that a convergent series of continuous functions is 
itself continuous in Cours d’analyse a l’Ecole Royal Polytechnique of 1821. Niels 
Henrik Abel (1802-1829) read Cauchy’s definition of convergence of series to mean 
pointwise convergence of series of functions, rather than uniform convergence as 
Cauchy understood it, and he then found an example of a series of continuous functions 
that converges pointwise to a discontinuous function. Abel also gave the first rigorous 
proof of the binomial series in 1826. 

Karl Weierstrass (1815-1897) clarified the difference between pointwise conver- 
gence and uniform convergence of series of functions, and showed that termwise 
differentiation and integration works nicely with uniform convergence, but not with 
pointwise convergence. In his lectures in 1872, Weierstrass gave an example of a 
continuous but nowhere differentiable function, and, in contrast to Bolzano’s ear- 
lier example, this one was widely seen. Weierstrass’ example is similar, though not 
identical, to the one we use in this text (which is the standard such example used 
today). 
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Dirichlet, Lejeune, 177 
discontinuity, 147 
discontinuous, 147 

at a point, 147 


Distributive Law, 6, 13, 20, 29, 43, 63 


divergent 
improper integral, 343, 344, 346 
sequence, 402 


Index 547 


series, 445 
diverges 
to infinity, 325, 331 
sequence, 408 
series, 445 
to infinity from the left, 326 
to infinity from the right, 326 
to negative infinity, 325 
sequence, 408 
series, 445 
to negative infinity from the left, 326 
to negative infinity from the right, 326 
division 
rational numbers, 31 
real numbers, 66 
Division Algorithm, 121 


element 

greatest, 92 
endpoint, 70 

left, 70 

right, 70 
equation 

differential, 379, 507 
Euclid, 52, 226, 392, 439, 483 
Eudoxus of Cnidus, 53, 313, 439 
Euler’s constant, 436 
Euler, Leonhard, 177, 355, 394, 440, 486, 

524, 536 

Euler—Mascheroni constant, 436 
even 

function, 265, 523 

integers, 128 
eventually repeating base p representation, 

122 

existence 

theorem, 121 
explicit description, 86 
exponential function, 361 

with base a, 366 
extended real numbers, 328 
extension 

function, 109 

periodic, 372 
Extreme Value Theorem, 163, 212 
extremum 

global, 209 

local, 209 
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factorial, 91, 475, 514 
Fermat, Pierre de, 56, 227, 315 
Fibonacci, 53, 393 
Fibonacci numbers, 431 
field, 30, 62 

ordered, 30, 47, 62 
fixed point, 170 
form 

indeterminate, 332 
formal sum, 444 
Fourier, Joseph, 177, 318, 486 
Frege, Gottlob, 59 
function 

bound, 137 

bounded, 137 

convex, 225 

even, 265, 523 

extension, 109 

odd, 265 

periodic, 371, 528 

polynomial, 90 

sawtooth, 527, 533 

step, 247 
Fundamental Theorem 

of Algebra, 33 

of Arithmetic, 150 

of Calculus 

Version I, 269 
Version II, 272 


Galilei, Galileo, 174, 315 
gamma function, 354 
gauge integral, 231 
Gauss, Carl Friedrich, 57, 178, 355, 394, 440, 
486 
generalized Riemann integral, 231 
geometric series, 285, 447 
Gerbert of Aurillac, 54 
global 
extremum, 209 
maximum, 209 
minimum, 209 
golden ratio, 432, 524 
greatest 
element, 92 
lower bound, 47, 63 
Greatest Lower Bound Property, 47, 97 
Grégoire de Saint-Vincent, 174, 397, 484 
Gregory of Rimini, 440 


Gregory, James, 227, 316, 393 


half-open interval, 70 
Halley, Edmond, 534 
Hamilton, William Rowan, 57 
harmonic series, 447, 455, 475, 514 
Heine, Eduard, 58, 178 
Heine—Borel Theorem, 103 
Henstock—Kurzweil integral, 231 
Hermite, Charles, 57 
Heron of Alexandria, 315 
Hilbert, David, 59 
Hipparchus 

of Nicaea, 394 

of Rhodes, see Hipparchus of Nicaea 
horizontal asymptote, 322 
Hudde, Johann, 227 
Huygens, Christiaan, 227 


Identity Law 
for Addition, 13, 20, 29, 43, 63 
for Multiplication, 6, 13, 20, 29, 43, 63 
improper integral, 342, 344, 346 
Type 1, 342 
Type 2, 345 
improperly integrable, 342, 344, 346 
increasing, 207, 412 
strictly, 207, 412 
indefinite integral, 277 
indeterminate form, 332 
induction, 83 
inductive 
hypothesis, 84 
reasoning, 83 
set, 76 
step, 84 
infimum, 47, 63 
infinitely differentiable, 189 
inner content, 296 
integers, 12, 79 
addition, 12 
axioms, 21 
even, 128 
less than, 12 
less than or equal to, 12 
multiplication, 12 
negative, 12, 14, 23 
odd, 128 
positive, 14, 23 


integrable, 235 
improperly, 342, 344, 346 
locally, 342 
integral 
domain, 19 
gauge, 231 
generalized Riemann, 231 
Henstock—Kurzweil, 231 
improper, 342, 344, 346 
indefinite, 277 
Lebesgue, 231, 297 
lower, 254 
Riemann, 231, 235 
Riemann-Stieltjes, 242 
upper, 254 
Integral Test, 454 
interior, 70, 294 
Intermediate Value Theorem, 163 
interval, 70 
closed, 2, 70 
closed bounded, 70 
closed unbounded, 70 
endpoint, 70 
half-open, 70 
interior, 70 
left unbounded, 70, 322 
non-degenerate, 70 
non-degenerate closed bounded, 70 
non-degenerate open bounded, 70 
of convergence, 477 
open, 70 
open bounded, 70 
open unbounded, 70 
right unbounded, 70, 322 
Inverses Law 
for Addition, 13, 20, 29, 43, 63 
for Multiplication, 29, 43, 63 
irrational 
cut, 36 
numbers, 80 


Johann Miiller of K6nigsberg, see 
Regiomontanus 

Jones, William, 394 

Jordan measure, 297, 310 

Jordan, Camille, 319 

Jyesthadeva, 534 


Kepler, Johannes, 174, 227, 314 
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Kronecker, Leopold, 58 


1’ H6pital’s Rule, 334 
l’H6pital, Guillaume de, 355 
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Lagrange Form of the Remainder Theorem, 
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Lagrange, Joseph-Louis, 202, 229, 536 
Lambert, Johann, 394 
Laplace transform, 341 
Laplace, Pierre-Simon, 485 
least 

element, 20 

upper bound, 47, 63 
Least Upper Bound Property, 48, 64 
Lebesgue 

integral, 231, 297 

measure, 284, 297, 310 
Lebesgue’s Theorem, 287 
Lebesgue, Henri, 319 
left 

endpoint, 70 

unbounded interval, 70, 322 
left-hand limit, 141 


Leibniz, Gottfried von, 56, 175, 228, 236, 


317, 393, 484, 535 
Leonardo of Pisa, 53, 393 
less than 

integers, 12 
natural numbers, 8 
rational numbers, 28 
real numbers, 42 
less than or equal to 
integers, 12 
natural numbers, 8 
rational numbers, 28 
real numbers, 20, 42, 66 
Levi ben Gerson, 54 
limit, 132, 322, 323, 402 
left-hand, 141 
one-sided, 141 
right-hand, 141 
superior, 479 
to infinity, 322 
Type 1, 322 
Type 2, 322 
Limit Comparison Test, 453 
Lindemann, Ferdinand von, 57, 394 
Liouville, Joseph, 57 
Lipschitz 
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condition, 162, 508 
constant, 162 
Liu Hui, 393 
local 
extremum, 209 
maximum, 209 
minimum, 209 
locally integrable, 342 
logarithm function, 359 
with base a, 367 
lower 
bound, 47, 63 
greatest, 47, 63 
cut, 40, 49 
integral, 254 


Machin, John, 393 
Maclaurin 

polynomial, 515 

series, 515 
Maclaurin, Colin, 486, 536 
Madhava of Sangamagramma, 534 
Mathematical Induction 

Principle of, 83 

Variant, 85 

Maurolycus, Franciscus, 55 
maximum, 92 

global, 209 

local, 209 
Mean Value Theorem, 200 
measure 

Jordan, 297, 310 

Lebesgue, 284, 297, 310 

zero, 284, 285 
Mengoli, Pietro, 450, 484 
Méray, Charles, 58 
Mercator, Nicolaus, 398, 534 
minimum 

global, 209 

local, 209 
monotone, 208, 412 

strictly, 208, 412 
Monotone Convergence Theorem, 412, 414 
multiplication 

integers, 12 

natural numbers, 6 

rational numbers, 28 

real numbers, 43 


Multiplication Law for Order, 13, 20, 29, 43, 
63 
multiplicative inverse 
rational numbers, 28 
real numbers, 43 


n root, 171 
Napier, John, 397 
natural logarithm function, 359 
natural numbers, 3, 23, 76 

addition, 5 

less than, 8 

less than or equal to, 8 

multiplication, 6 
negative, 69 

integers, 12, 14, 23 

part, 469 

rational numbers, 28 

real numbers, 42 
Neile, William, 316 
Nested Interval Theorem, 428 
Newton’s Method, 437 
Newton, Isaac, 175, 228, 317, 393, 484, 534 
Nicholas of Cusa, 172 
No Zero Divisors Law, 13, 20, 66 
non-degenerate 

interval, 70 

rectangle, 294 
non-negative, 69 
Non-Triviality, 13, 20, 29, 44, 63 
numbers 

algebraic, 388 

Fibonacci, 431 

irrational, 80 

natural, 3, 23, 76 

rational, 27, 28, 80 

real, 42 

transcendental, 33, 388, 522 
numeral 

Roman, 113 


odd 
function, 265 
integers, 128 
one-sided 
derivative, 189 
limit, 141 
open 
bounded interval, 70 


non-degenerate, 70 
interval, 70 
unbounded interval, 70 

operation 
binary, 2 

closed, 2 
unary, 2 

closed, 2 

order relation, 41 
ordered 
field, 30, 47, 62 

axioms, 62 
integral domain 

axioms, 20 
set, 41 

Oresme, Nicole, 53, 172, 226, 314, 397, 447, 
483 
outer content, 296 


p-series, 455 
Parmenides of Elea, 354 
partial sum, 445, 503 
sequence of, 445, 503 
Pascal, Blaise, 55, 175, 227, 316 
Peano Postulates, 3, 23, 77 
Peano, Giuseppe, 59, 319 
period, 371 
periodic 
extension, 372 
function, 371, 528 
Picard iteration, 508 
Picard, Charles Emile, 508 
Pigeonhole Principle, 125 
place value system, 114 
Plato, 52, 392 
pointwise convergent, 490, 503 
polygonal sum, 304 
polynomial function, 90 
Pope Sylvester II, 54 
positive, 69 
integers, 14, 23 
part, 469 
power series, 474 
represented by, 510 
Principle 
of Mathematical Induction, 83 
Variant, 85 
Well-Ordering, 9, 21, 39, 78, 100, 102, 114 
probability, 341 
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Product Rule, 193 

proof by induction, 83 

Ptolemy, Claudius, 393 

Pythagoras of Samos, 52 
Pythagorean Theorem, 304, 306, 382 


Quotient Rule, 193, 260 


radius of convergence, 477 
Raphson, Joseph, 437 
Ratio Test, 461, 474 
rational cut, 36 
rational numbers, 27, 28, 80 

addition, 28 

division, 31 

less than, 28 

less than or equal to, 28 

multiplication, 28 

multiplicative inverse, 28 

negative, 28 

subtraction, 31 
real numbers, 42 

addition, 42 

axioms, 64 

division, 66 

extended, 328 

less than, 42 

less than or equal to, 20, 42, 66 

multiplication, 43 

multiplicative inverse, 43 

negative, 42 

subtraction, 66 
rearrangement, 467 
Recorde, Robert, 55 
rectangle, 294 

area, 295 

interior, 294 

non-degenerate, 294 
rectifiable, 306 
recursive 

definition, 86 

description, 86 
Regiomontanus, 396 
region 

between the graphs, 299 

under the graph, 299 
remainder 

Taylor polynomial, 519 
represented by a power series, 510 
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Rheticus, Georg Joachim, 396 
Riemann 

integrable, 235 

integral, 231, 235 

sum, 231, 234 
Riemann, Georg Friedrich Bernhard, 319, 

470 

Riemann-Stieltjes 

integrable, 242 

integral, 242 

sum, 241 
Ries, Adam, 55 
right 

endpoint, 70 

unbounded interval, 70, 322 
right-hand limit, 141 
Robert of Chester, 395 
Roberval, Gilles de, 227, 315 
Robinson, Abraham, 179 
Rolle’s Theorem, 199 
Roman numeral, 113 
root 

nh 171 

square, 171 
Root Test, 479 
Russell, Bertrand, 58 


Sarasa, Alfonso Antonio de, 397 
sawtooth function, 527, 533 
secant line, 220 
slope of, 221 
second derivative, 188 
sequence, 115, 400 
bounded, 405 
above, 405 
below, 405 
Cauchy, 417 
constant, 403 
of functions, 490 
partial sums, 445, 503 
series, 116, 444 


alternating harmonic, 457, 460, 467, 473, 


475, 514 
geometric, 285, 447 
harmonic, 447, 455, 475, 514 
Maclaurin, 515 
of functions, 502 
power, 474 
rearrangement, 467 


sum, 445 

Taylor, 515 

telescoping, 446 
set 

Cantor, 286, 430, 435 

inductive, 76 

ordered, 41 
signed area, 302 
Simpson, Thomas, 437 
sine, 375 
slope of the secant line, 221 
Sluse, René de, 227 
smooth, 189 
Somayaji, Nilakantha, 534 
special polygon, 294 

area, 295 
spiral 

Archimedean, 226 
squarable, 297 
square, 66 

root, 101, 171 
step function, 247 
Stevin, Simon, 55, 173, 314 
Stirling, James, 486, 536 
strictly 

decreasing, 207, 412 

increasing, 207, 412 

monotone, 208, 412 
subsequence, 415 
subtraction 

rational numbers, 31 

real numbers, 66 
Suiseth, see Swineshead, Richard 
sum 

formal, 444 

of series, 445 

partial, 445, 503 
supremum, 47, 63 
Swineshead, Richard, 483 
symmetric derivative, 191 
symmetrically differentiable, 191 


Taylor 
polynomial, 515 
remainder, 519 
series, 515 
Taylor’s Theorem, 202, 283, 519 
Taylor, Brook, 536 
telescoping series, 446 


term 
sequence, 400 
sequence of functions, 490 
series, 444 
series of functions, 502 
ternary expansion, 431 
Test 
Ratio, 461, 474 
Root, 479 
Theaetetus of Athens, 52 
Theodorus of Cyrene, 52 
Theorem 
Abel’s, 524 
Bolzano—Weierstrass, 417 
Cauchy Completeness, 419 
Cauchy’s Mean Value, 201 
Extreme Value, 163, 212 
Heine—Borel, 103 
Intermediate Value, 163 
Lagrange Form of the Remainder, 519 
Lebesgue’s, 287 
Mean Value, 200 
Monotone Convergence, 412, 414 
Nested Interval, 428 
Pythagorean, 304, 306, 382 
Rolle’s, 199 
Taylor’s, 202, 283, 519 
Torricelli, Evangelista, 227, 316, 355, 484 
transcendental numbers, 33, 388, 522 
Transitive Law, 13, 20, 29, 43, 63 
Triangle Inequality, 71, 311 
Trichotomy Law, 8, 13, 20, 29, 43, 63 
twice differentiable, 188 
Type | 
improper integral, 342 
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limit to infinity, 322 
Type 2 

improper integral, 345 

limit to infinity, 322 


unary operation, 2 
closed, 2 
uniformly 
continuous, 158 
convergent, 493, 503 
upper 
bound, 47, 63 
least, 47, 63 
cut, 40, 49 
integral, 254 


Valerio, Luca, 173 

van Heuraet, Hendrik, 316 
van Schooten, Frans, 484 
Varahamihira, 395 

variable, 90 

vertical asymptote, 322 
Viéte, Francois, 55, 393, 484 
Volterra, Vito, 434 


Wallis , John, 175, 316, 355, 393, 484, 535 

Weierstrass, Karl, 58, 178, 440, 528, 537 

Well-Ordering Principle, 9, 21, 39, 78, 100, 
102, 114 

Wren, Christopher, 316 


Yi Xing, 395 


Zeno of Elea, 172, 354, 439 
Zu Chongzhi, 393 
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